Saturday, December 17, 2011

9.0 RC available with secret sauce

9.0 RCs are now available. The RC fixes the missing favicons problem (this is actually a Mozilla bug, but they will not ship the fix until Fx10, for those of you using Firefox on an Intel Mac) and a glitch with a malfunctioning chrome script that causes weird behaviour on AMO and certain other OpenWebapps-integrated sites.

Oh, yeah, the secret sauce. It is generally not my policy to introduce new code in an RC, much to the annoyance of our contributors, and although in a strict sense the new code is also shipping as part of regex compilation (which was in the beta) the "sauce" is turned off as shipped. However, we're also in a bit of a time crunch. Fx10 is likely to have integration problems with tracejit at best and outright break it at worst, and losing tracejit with nothing to replace it specifically means a real punch in our gut.

So we really need to test our new JavaScript PowerPC methodjit now so that we can start working on any bugs and get type inference working (I'll discuss what that means in a moment), preferably for 10.4Fx 10. And that's what you're going to get to do with the "RC," because the RC has methodjit secretly embedded in it! Start the browser and go to about:config, and turn both javascript.options.methodjit.chrome and javascript.options.methodjit.content to true. Do not change any other values related to JavaScript, and do not turn type inference on (the browser will crash immediately if you do). Leave the tracejit on. Restart the browser to clear everything out. The browser will be a little slower to restore itself because it will now start compiling the chrome methods with methodjit as well.

Tracejit and methodjit attack JavaScript just-in-time compilation in two different ways. Our old friend tracejit watches the interpreter run code. If it sees something it determines is "hot," it turns on a tracer that translates JavaScript operations into an intermediate language called LIR and from there into PowerPC assembly language. This is very fast and the fastest way of running certain kinds of code, but branchy code paths cannot be built with tracejit because the execution runs in too many directions, and an unexpected change from the normal execution path means the tracer has to start a new one. So that's where methodjit comes in. If JavaScript determines (through profiling and various internal heuristics) that tracing will be unprofitable on a section of code, it can try simply compiling the entire JavaScript method en masse. This is slower than tracejit at some tasks because it may end up compiling code paths it may never execute (most importantly related to covering the gamut of implied types JavaScript makes possible -- read on), but it also allows certain kinds of optimizations that a tracer will never see because it only sees "one view" of the script rather than the entire bulk of code. The browser will pick which of them is most effective, caching its choice and the JITted code, and you will notice that as more code is in the cache the faster the browser gets.

Tracejit has fundamental differences from methodjit and Mozilla is proceeding in directions that tracejit is not compatible with, which is why it is being disabled in 10 and outright deleted from the tree in 11. This is bad for us, because tracejit executes certain kinds of code far faster than methodjit ever will. Mozilla's solution to methodjit's overhead is to introduce type inference to JavaScript. Remember that I mentioned part of the trouble with methodjit is compiling unnecessary code to cover the possible (or at least reasonable) permutations for JavaScript parameters and variables. If it could be determined what the most likely set of types were for an execution path, then code covering just those types could be built and the rest could fall through to the interpreter for a second chance while the methodjit revises its guess. The guess has to be good, because the penalty is to lose both the code compiled and to have to execute in the interpreter, a penalty tracejit hardly ever has to take because the types are already determined for it by the time the tracer gets enabled (and it doesn't have to junk the code path it already compiled, because it might use it again). This is where Mozilla is putting significant resources to beat Chrome V8 and Crankshaft, and Fx9 is the first Firefox where type inference is being exposed to the user by default.

Type inference makes drastic changes to the JavaScript engine, and as a result, JM+TI (that is, JaegerMonkey/methodjit with type inference) is incompatible with TM (tracejit), even though you can combine JM+TM without TI. But because type inference and other optimizations are part of the future IonMonkey project, tracejit is now old news. This is sad for us because our tracejit is very heavily optimized, but that wasn't my call, and Mozilla was kind enough to give us until Fx11 to get our backend shifted over. However, we don't support type inference yet -- I couldn't get it fully working in time. This is just as well since we should really test the core browser without it first.

So, shipping in this release is the Nitro macroassembler for PowerPC (which is an .h file with certain semi-atomic operations and their assembly-language equivalents), expanded for Mozilla's purposes, and a lower-level assembler that the macroassembler calls to do the translation. These are the pieces Ben diligently worked on the first draft of, and this version includes my completed draft with the bugs worked out. They are not as optimized as they could be, but they do work, and we can delay that a bit to get any other major issues out. Both YARR and methodjit use the macroassembler, YARR for regular expression compilation (which we do have enabled in 9), and methodjit for JavaScript. There is also a snippet of code called a "trampoline" which is a skeleton function prologue, epilogue and stub call library I wrote by hand in PowerPC assembly to interface with the JavaScript C++ interpreter, and the other pieces needed for JavaScript to understand OS X ABI stack frames and register allocation. Those of you watching our stack usage problem will now note that our methodjit stack frames are a fixed 128 bytes, overall smaller than tracejit, and our YARR stack frames are a fixed 32 plus any subpattern usage. We should start seeing smaller stack overhead when tracejit goes away, which is a consolation prize. The interested can see what our stack frames now look like in js/src/methodjit/MethodJIT.h and js/src/methodjit/TrampolinePPCOSX.s.

There is an effort underway to port the TenFourFox PowerPC-specific code to Linux. The VMX code Tobias wrote will translate pretty much directly, but the JIT compilers will not because they assume the OS X application binary interface, not the Linux ABI. Linux ABI uses different stack frame sizes, a different linkage area (the first few words of the stack frame), different volatile register designations and different parameter register allocation. Tracejit was not really written for anything else but OS X, but methodjit is being designed to be friendlier to our PPC brethren on other operating systems like Linux and should be particularly easy to port to AIX (and a tip of the hat to Andrew, who is running server-side JavaScript on big-iron POWER servers like our very own POWER6 here at Floodgap Orbiting HQ). I don't have any ETA on this and I am not personally involved with the project, but I have had contact with them and given them advice and suggestions. I'm sure they will post when they have early builds to play with. Please note that TenFourFox itself will always be for OS X; this effort would be better understood as part of a PowerPC-enhanced generic Firefox.

So, how are the numbers? My quad G5 doesn't change much on SunSpider (around 1050-1070ms for both TM and JM+TM), but Dromaeo in 9 goes from 114 to 122 runs/sec with JM+TM. The 1GHz G4 in 9 goes from 32 to 36 runs/sec and from 3300 to 2800ms in SunSpider. Overall, while not the dramatic change that tracejit was, it's still progress and I expect even more when we get type inference running; the project goal is still to get the quad G5's SunSpider time under one second and we're almost there. In the meantime I want you to bang on the browser hard, please -- we've got to get all the kinks out!

Some housekeeping. Firefox rapid release is developing a long tail of people not upgrading to later versions for a variety of reasons ranging from ignorance to add-on anxiety to update fatigue, which they have attempted to fix with the 3.6 to 8.0 "major update." The issue of the uninformed and shortly-to-be-marooned PowerPC Firefox 3.6 user notwithstanding, we have the same problem, made more acute by the termination of plugin support with TenFourFox 6. Nevertheless, I think it is bad form to have so many users on an unsupported branch with known security holes and I'm considering updating all the version snippets to remind users that TenFourFox 9 is available for them too and to please update to a supported version. They could of course ignore this nag as well, but at least we tried. Argue passionately for or against this idea in the comments.

One last note: do not downgrade to 9.0b1 with methodjit on -- make sure you turn it off before trying to re-run the beta for comparison, or the browser will immediately assert. In fact, you should probably do that for any previous version of TenFourFox, just to be safe.

On deck for 10 will be Tobias' fixed AltiVec text converters back in the lineup so he won't come over to my house with an axe ;), and some other additional minor optimizations, but type inference is the big kahuna and I may delay the beta until it is ready. As always the Secretary will disavow, I mean, the beta port will start when Mozilla certifies the first 10 beta. And this blog entry will self-destruct in five seconds because of the secret sauce. So you'd better read the release notes and download for your architecture now. Good luck, Jim.

11 comments:

  1. When I got to the end smoke came outta my G4, is that a bug?

    ReplyDelete
  2. There's just no satisfying you, is there? :P

    ReplyDelete
  3. Like the Dromaeo and Sunspider scores, but I discovered one page rendering problem in Blogger's Dashboard interface. Here's how it looked in TFF 8 and also in Safari:

    http://i1192.photobucket.com/albums/aa333/dantheperson/BloggerinSafari.jpg

    And here's how it looks in TFF 9. Many options are stripped:

    http://i1192.photobucket.com/albums/aa333/dantheperson/BloggerinTenFourFox.jpg

    This is with no add-ons and javascript enabled. Maybe I'll just grow up and click the "Try the updated Blogger interface" link.

    ReplyDelete
  4. It's the same thing in Windows Fx9, so I think it's Blogger's fault. I'll check again in the office.

    ReplyDelete
  5. Several Peacekeeper test runs (new version), TFF 9.0 RC, PowerBook G4 1.33 GHz, 2 GB RAM, 10.5.8

    TJ only: 260, 268, 264
    MJ only: 254, 253, 251
    TJ+MJ: 261, 261, 266
    no jit: 203, 201, 202

    More to come…

    ReplyDelete
  6. SunSpider (0.9), TFF 9.0 RC, PowerBook G4 1.33 GHz, 2 GB RAM, 10.5.8

    TJ only: 2736.4ms
    MJ only: 2656.4ms
    TJ+MJ: 2370.4ms
    no jit: 7333.8ms

    ReplyDelete
  7. ... confirmed for Dan, yes, Windows Fx9 has the problem also. I think Blogger just wants everyone to use the new hotness. However, I like this new minimalist Blogger because I prefer to enter HTML by hand anyway.

    Based on Chris' numbers, I am thinking we have another G5 optimization problem. We obviously aren't using mcrxr anymore, and we have some dispatch group optimizations in the floating point code, so I think we need to start stamping out and eliminating or removing microcoded instructions next. But type inference is first.

    ReplyDelete
  8. Kraken 1.1, TFF 9.0 RC, PowerBook G4 1.33 GHz, 2 GB RAM, 10.5.8

    TJ only: 60,350.6ms
    MJ only: 121,813.5ms (??)
    TJ+MJ: 63,892.5ms
    no jit: 388,281.3ms

    ReplyDelete
  9. On :
    Mac OS X 10.5.8
    2 x 2.5 GHz PowerPC G5
    5 GB DDR SDRAM

    peacekeeper futuremark results is :

    Firefox
    534 Points
    Detailed version information:
    Mozilla/5.0 (Macintosh; PPC Mac OS X 10.5; rv:9.0) Gecko/20111217 Firefox/9.0 TenFourFox/G5

    Suite Result

    Rendering 14,26
    renderGrid01 69,67 fps
    renderGrid02 24,07 fps
    renderGrid03 1,55 fps
    renderPhysics 15,86 fps

    HTML5 Capabilities 5/7
    webglSphere N/A
    videoPosterSupport Yes
    videoCodecH264 N/A
    videoCodecTheora Yes
    videoCodecWebM Yes
    workerContrast01 Yes (889,83 ops)
    workerContrast02 Yes (898,09 ops)
    gamingSpitfire Yes (34,66 fps)

    HTML5 Canvas 4,79
    experimentalRipple01 7,35 fps
    experimentalRipple023,12 fps

    Data 10184,02
    arrayCombined 2723,20 ops
    arrayWeighted 38085,50 ops

    DOM operations 2324,37
    domGetElements 348642,50 ops
    domDynamicCreationCreateElement 6517,00 ops
    domDynamicCreationInnerHTML 5630,50 ops
    domJQueryAttributeFilters 531,73 ops
    domJQueryBasicFilters 232,38 ops
    domJQueryBasics 506,00 ops
    domJQueryContentFilters 275,86 ops
    domJQueryHierarchy 713,00 ops
    domQueryselector 12587,50 ops

    Text parsing 26846,81
    stringChat 32837,50 ops
    stringDetectBrowser 76883,50 ops
    stringFilter 780,50 ops
    stringValidateForm 151712,50 ops
    stringWeighted 46651,50 ops

    ReplyDelete
  10. Just as a data point, I have not bothered to run benchmarks here on my iBook G4 1.33GHz, but TFF 9 with methodjit enabled is stable enough to leave it on for my limited browsing needs.

    I really appreciate your hard work and I am constantly impressed by all the detailed analysis that you've given of the situation (and also grateful for the software produced).

    Thanks once again.

    ReplyDelete

Due to an increased frequency of spam, comments are now subject to moderation.