During the long death march to Firefox 4, Mozilla made much hay out of their Are We Fast Yet? site, a/k/a AWFY. AWFY was their daily dangling carrot while they worked on the first iteration of methodjit and offered regular feedback against the competition, and is still being used in the new type inference age. Well, JM+TI is here for PowerPC at last (that is, JaegerMonkey and type inference), and so is 9.0.1pre with new backends for your experimental pleasure. There are no other changes in this release and it is otherwise exactly the same as 9.0; it should fix all the known flaws in methodjit. (It does not include the fix Mozilla issued in Firefox 9.0.1 because I have received no relevant reports of crashes related to it, and it may impact performance, so there will not be a 9.0.1 unless I hear differently.)
So, with this release, I am now proud to bring you Are We Old And Fast Yet? where the first answer is yes and the second answer is "not enough." JM+TI now offers JavaScript speeds comparable to our old friend tracejit: on the quad G5, JM+TI (without tracejit) now matches the 1050ms on SunSpider that tracejit gets, and increases Dromaeo to 132 runs/sec. On our testing 1GHz G4/7450, Dromaeo rises to 38 runs/sec, and SunSpider drops slightly to 2750. (We're solving the missing square root problem for the present time by simply falling through to the JavaScript square root library routine. The G5 uses its native square root instruction, of course.)
This is already pretty darn good, but we can do better, and here is our goal for AWOAFY: quad G5 at Highest achieves less than 1000ms on SunSpider; 1GHz G4/7450 achieves less than 2500ms. These are totally doable; in fact, improvements in Firefox due to ship in the Fx11 timeframe may push these numbers down to target without us doing anything, but we're still going to shoot for low-hanging fruit optimizations in TenFourFox 10.
That brings us to what happens next. If Firefox 10 is indeed the ESR, as our Magic 8-ball predicts (it also predicted I would get a date with Scarlett Johansson, though), there are enough pieces of tracejit that it should still work and I think we should stick with tracejit as the default JIT for the duration of the 10 branch if so (with the option for JM+TI baked in for the adventurous): it's well tested, it's safe and it does well on our older hardware. I am waiting for beta 2 to be certified on mozilla-beta and once it is, we will begin the port. On deck for this release are some fixes from Ben and getting Tobias' fixed VMX text conversion routines back in action.
If Firefox 10 is not the ESR, then Fx10 will be the last tracejit release, and people will be forced to use JM+TI starting with Fx11. Sorry, this is Mozilla's call, not mine.
Either way, when we scramble to the ESR, the ESR will become our stable branch. Mozilla will backport security and stability fixes to the ESR which we will pick up, and unlike Mozilla, we will do other kinds of bugfixes to TenFourFox "ESR" that we deem appropriate. There will be no new features added, however. Most users will stay on this release.
For you, the rowdy, wild beta testers, there will still be beta releases, but not formal releases (because there won't be formal releases for awhile -- users will be kept on the ESR-based release). The situation we are trying to avoid is getting stuck on a release of Firefox that isn't getting security updates because we are unable to get the next one working. If, say, we get to Firefox 14 but can't make it to 15 -- and this is a real risk because Mozilla is already making noise about canning 10.5 support somewhere around Fx13, as we have previously discussed -- then we are marooning our users on a branch that will never get any fixes. By keeping later updates merely betas, most users are on a maintained branch and those of us out on the beta channel can just downgrade to the already existing release and live out our days in secure geriatric bliss in feature parity.
Assuming the ESR is 10, the next ESR would be Fx17. If we make it to Fx16, then we will do a regular beta release for Fx16, release Fx16, and go for 17. We can certainly backport security fixes from Fx17 to Fx16 if we fail at Fx17 and stay safe; the code is similar enough. If we don't make it to Fx16, then we can go on to feature parity against the ESR using the patches we've already done for later releases. So I feel pretty good about the future.
JM+TI is only transitional, however, because the future is Mozilla's new IonMonkey engine. This is months away from being functional, but I already looked at some of the code for the existing partially-functional backends and while much of our JM+TI work will transfer, there is a lot of work to do to get it working for PowerPC. Let's try to get as much wear out of the work we've already done.
Now, let's test AWOAFY. Download 9.0.1pre and go to about:config. Turn javascript.options.methodjit.chrome and .content to true, and turn javascript.options.tracejit.chrome and .content to false (yes, you are disabling tracejit; it is not compatible with type inference). Finally, set javascript.options.typeinference to true and restart the browser. Report your results! Remember to turn type inference off if you need to downgrade to 9.0, and turn methodjit off entirely if you downgrade to 9.0b1 or earlier.
Tuesday, December 27, 2011
Sunday, December 25, 2011
Our own Christmas present
After many days of non-stop debugging by Ben, Tobias and yours truly,
% ./jit_test.py --jitflags=mp ../../../obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js # straight methodjit
[1775| 0|1775] 100% ===============================================>| 257.0s
PASSED ALL
% ./jit_test.py --jitflags=mnp ../../../obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js # JM+TI
[1775| 0|1775] 100% ===============================================>| 270.0s
PASSED ALL
Now, how did Santa fit all that down the chimney? Again, this is an unoptimized JS, so ignore the timings.
The bugs that were repaired to get type inference working are systemic bugs, so they may actually fix the failure cases for straight methodjit. For G3/G4, we are simply going to prevent the optimizer from using a square root instruction for the time being and just use the JS math library routine (and later develop our own optimized assembly version that can be inlined). I'm going to do some more internal conformance testing this week and if this all checks out, hopefully we will have a 9.0.1pre for you to test by New Year's.
And keep Chris out of the eggnog.
% ./jit_test.py --jitflags=mp ../../../obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js # straight methodjit
[1775| 0|1775] 100% ===============================================>| 257.0s
PASSED ALL
% ./jit_test.py --jitflags=mnp ../../../obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js # JM+TI
[1775| 0|1775] 100% ===============================================>| 270.0s
PASSED ALL
Now, how did Santa fit all that down the chimney? Again, this is an unoptimized JS, so ignore the timings.
The bugs that were repaired to get type inference working are systemic bugs, so they may actually fix the failure cases for straight methodjit. For G3/G4, we are simply going to prevent the optimizer from using a square root instruction for the time being and just use the JS math library routine (and later develop our own optimized assembly version that can be inlined). I'm going to do some more internal conformance testing this week and if this all checks out, hopefully we will have a 9.0.1pre for you to test by New Year's.
And keep Chris out of the eggnog.
Thursday, December 22, 2011
9.0.1 under review
I'm reviewing the single change in 9.0.1 (backout of bug 708572) to see if it applies to us, although it probably does. However, the relevant crash (bug 711794) seems to primarily affect 64-bit x86_64 and we are of course 32-bit PowerPC, and there is a definite performance impact to taking the backout. We don't support the offending toolbars implicated in the STRs in any case. Crash reports matching the signature would be strongly appreciated to allow me to make the call (please don't send other crash signatures unless you are positive they are related somehow). There is no security impact to this bug that I am aware of.
Tuesday, December 20, 2011
Is Fx10 the ESR?
I'm not sure if LegNeato made a Freudian slip, but comment 38 is very suspicious. It's also welcome news in case PPC JM+TI runs aground, because we could probably hack tracejit to work by itself in the ESR if we had to.
Two bits of bad news on methodjit. The first bit of bad news is that type inference requires the CPU to have a square root instruction, and only G5 does. We could emulate this on G3 and G4, but it would be expensive, of course. Assembly jockeys, here's your chance: come up with a routine in assembly to do PPC square root without a lookup table (you can use a couple constants if you like). It doesn't have to be fast because it's going to suck anyway; it just has to work. This routine might give you some ideas. Post your entries to issue 96; the winner gets beer.
The second bit of bad news is that we have some serious problem with how JavaScript stack frames are managed (not to be confused with the VMFrames we generate in methodjit), and it's not even in the code I wrote. I'm not sure if it's a compiler fart or not. This might be the explanation for some of the serious slowdowns on some sites, and it's possible the problem was there all along and it's just JM+TI that unmasks it. The problem crops up when nesting JS stack frames and if all else fails, I might be able to hack the interpreter to just turn TI off and drop to JM alone if the situation occurs. But that eliminates the entire advantage of JM+TI in the first place, and still isn't as fast as tracejit.
Life sucks. More later.
Two bits of bad news on methodjit. The first bit of bad news is that type inference requires the CPU to have a square root instruction, and only G5 does. We could emulate this on G3 and G4, but it would be expensive, of course. Assembly jockeys, here's your chance: come up with a routine in assembly to do PPC square root without a lookup table (you can use a couple constants if you like). It doesn't have to be fast because it's going to suck anyway; it just has to work. This routine might give you some ideas. Post your entries to issue 96; the winner gets beer.
The second bit of bad news is that we have some serious problem with how JavaScript stack frames are managed (not to be confused with the VMFrames we generate in methodjit), and it's not even in the code I wrote. I'm not sure if it's a compiler fart or not. This might be the explanation for some of the serious slowdowns on some sites, and it's possible the problem was there all along and it's just JM+TI that unmasks it. The problem crops up when nesting JS stack frames and if all else fails, I might be able to hack the interpreter to just turn TI off and drop to JM alone if the situation occurs. But that eliminates the entire advantage of JM+TI in the first place, and still isn't as fast as tracejit.
Life sucks. More later.
Monday, December 19, 2011
9.0 now release
9.0 is now in the release channel and available to users. As stated in the previous blog entry, methodjit is not on by default for the unwashed masses, just the regex compilation with tracejit which so far has proven to be very fast and safe.
So far there have been no reported crashes with methodjit, but I have received at least two reports of significant slowdowns on selected sites. We are tracking these in issue 119. Please only report sites there that are clearly and reproducibly being degraded by methodjit (i.e., you turn off methodjit and the problem goes away; you turn methodjit back on and the site drags back down again), not general slow issues or slowness from some other cause. It would also help to isolate a script as the specific cause using NoScript or similar tools in case we have to minimize a test case.
Type inference is still being difficult, but the MIPS port does not require anything extra of significance for it to work and MIPS is "big RISC" like us, so it's clearly something I've done wrong I haven't found yet. Right now I'm investigating stub calls, since they are by design non-ABI-compliant, and the extant code may be just clever enough to get it working in some places but not others. It is also possible that these two problems may be related.
I have also made the executive decision that we will not advance to 10 until we have both type inference and the basic methodjit working with no showstopper failures (i.e., no reproducible crashes or long hangs that are not reproducible in stock Firefox, and composite benchmarks that are not significantly worse than tracejit). We will issue interim 9 releases until then, because odds are good tracejit will be in serious disrepair in 10, even though it still exists. When I have these two issues nailed out, there will be a 9.0.1pre for you loyal beta users to try.
So far there have been no reported crashes with methodjit, but I have received at least two reports of significant slowdowns on selected sites. We are tracking these in issue 119. Please only report sites there that are clearly and reproducibly being degraded by methodjit (i.e., you turn off methodjit and the problem goes away; you turn methodjit back on and the site drags back down again), not general slow issues or slowness from some other cause. It would also help to isolate a script as the specific cause using NoScript or similar tools in case we have to minimize a test case.
Type inference is still being difficult, but the MIPS port does not require anything extra of significance for it to work and MIPS is "big RISC" like us, so it's clearly something I've done wrong I haven't found yet. Right now I'm investigating stub calls, since they are by design non-ABI-compliant, and the extant code may be just clever enough to get it working in some places but not others. It is also possible that these two problems may be related.
I have also made the executive decision that we will not advance to 10 until we have both type inference and the basic methodjit working with no showstopper failures (i.e., no reproducible crashes or long hangs that are not reproducible in stock Firefox, and composite benchmarks that are not significantly worse than tracejit). We will issue interim 9 releases until then, because odds are good tracejit will be in serious disrepair in 10, even though it still exists. When I have these two issues nailed out, there will be a 9.0.1pre for you loyal beta users to try.
Saturday, December 17, 2011
9.0 RC available with secret sauce
9.0 RCs are now available. The RC fixes the missing favicons problem (this is actually a Mozilla bug, but they will not ship the fix until Fx10, for those of you using Firefox on an Intel Mac) and a glitch with a malfunctioning chrome script that causes weird behaviour on AMO and certain other OpenWebapps-integrated sites.
Oh, yeah, the secret sauce. It is generally not my policy to introduce new code in an RC, much to the annoyance of our contributors, and although in a strict sense the new code is also shipping as part of regex compilation (which was in the beta) the "sauce" is turned off as shipped. However, we're also in a bit of a time crunch. Fx10 is likely to have integration problems with tracejit at best and outright break it at worst, and losing tracejit with nothing to replace it specifically means a real punch in our gut.
So we really need to test our new JavaScript PowerPC methodjit now so that we can start working on any bugs and get type inference working (I'll discuss what that means in a moment), preferably for 10.4Fx 10. And that's what you're going to get to do with the "RC," because the RC has methodjit secretly embedded in it! Start the browser and go to about:config, and turn both javascript.options.methodjit.chrome and javascript.options.methodjit.content to true. Do not change any other values related to JavaScript, and do not turn type inference on (the browser will crash immediately if you do). Leave the tracejit on. Restart the browser to clear everything out. The browser will be a little slower to restore itself because it will now start compiling the chrome methods with methodjit as well.
Tracejit and methodjit attack JavaScript just-in-time compilation in two different ways. Our old friend tracejit watches the interpreter run code. If it sees something it determines is "hot," it turns on a tracer that translates JavaScript operations into an intermediate language called LIR and from there into PowerPC assembly language. This is very fast and the fastest way of running certain kinds of code, but branchy code paths cannot be built with tracejit because the execution runs in too many directions, and an unexpected change from the normal execution path means the tracer has to start a new one. So that's where methodjit comes in. If JavaScript determines (through profiling and various internal heuristics) that tracing will be unprofitable on a section of code, it can try simply compiling the entire JavaScript method en masse. This is slower than tracejit at some tasks because it may end up compiling code paths it may never execute (most importantly related to covering the gamut of implied types JavaScript makes possible -- read on), but it also allows certain kinds of optimizations that a tracer will never see because it only sees "one view" of the script rather than the entire bulk of code. The browser will pick which of them is most effective, caching its choice and the JITted code, and you will notice that as more code is in the cache the faster the browser gets.
Tracejit has fundamental differences from methodjit and Mozilla is proceeding in directions that tracejit is not compatible with, which is why it is being disabled in 10 and outright deleted from the tree in 11. This is bad for us, because tracejit executes certain kinds of code far faster than methodjit ever will. Mozilla's solution to methodjit's overhead is to introduce type inference to JavaScript. Remember that I mentioned part of the trouble with methodjit is compiling unnecessary code to cover the possible (or at least reasonable) permutations for JavaScript parameters and variables. If it could be determined what the most likely set of types were for an execution path, then code covering just those types could be built and the rest could fall through to the interpreter for a second chance while the methodjit revises its guess. The guess has to be good, because the penalty is to lose both the code compiled and to have to execute in the interpreter, a penalty tracejit hardly ever has to take because the types are already determined for it by the time the tracer gets enabled (and it doesn't have to junk the code path it already compiled, because it might use it again). This is where Mozilla is putting significant resources to beat Chrome V8 and Crankshaft, and Fx9 is the first Firefox where type inference is being exposed to the user by default.
Type inference makes drastic changes to the JavaScript engine, and as a result, JM+TI (that is, JaegerMonkey/methodjit with type inference) is incompatible with TM (tracejit), even though you can combine JM+TM without TI. But because type inference and other optimizations are part of the future IonMonkey project, tracejit is now old news. This is sad for us because our tracejit is very heavily optimized, but that wasn't my call, and Mozilla was kind enough to give us until Fx11 to get our backend shifted over. However, we don't support type inference yet -- I couldn't get it fully working in time. This is just as well since we should really test the core browser without it first.
So, shipping in this release is the Nitro macroassembler for PowerPC (which is an .h file with certain semi-atomic operations and their assembly-language equivalents), expanded for Mozilla's purposes, and a lower-level assembler that the macroassembler calls to do the translation. These are the pieces Ben diligently worked on the first draft of, and this version includes my completed draft with the bugs worked out. They are not as optimized as they could be, but they do work, and we can delay that a bit to get any other major issues out. Both YARR and methodjit use the macroassembler, YARR for regular expression compilation (which we do have enabled in 9), and methodjit for JavaScript. There is also a snippet of code called a "trampoline" which is a skeleton function prologue, epilogue and stub call library I wrote by hand in PowerPC assembly to interface with the JavaScript C++ interpreter, and the other pieces needed for JavaScript to understand OS X ABI stack frames and register allocation. Those of you watching our stack usage problem will now note that our methodjit stack frames are a fixed 128 bytes, overall smaller than tracejit, and our YARR stack frames are a fixed 32 plus any subpattern usage. We should start seeing smaller stack overhead when tracejit goes away, which is a consolation prize. The interested can see what our stack frames now look like in js/src/methodjit/MethodJIT.h and js/src/methodjit/TrampolinePPCOSX.s.
There is an effort underway to port the TenFourFox PowerPC-specific code to Linux. The VMX code Tobias wrote will translate pretty much directly, but the JIT compilers will not because they assume the OS X application binary interface, not the Linux ABI. Linux ABI uses different stack frame sizes, a different linkage area (the first few words of the stack frame), different volatile register designations and different parameter register allocation. Tracejit was not really written for anything else but OS X, but methodjit is being designed to be friendlier to our PPC brethren on other operating systems like Linux and should be particularly easy to port to AIX (and a tip of the hat to Andrew, who is running server-side JavaScript on big-iron POWER servers like our very own POWER6 here at Floodgap Orbiting HQ). I don't have any ETA on this and I am not personally involved with the project, but I have had contact with them and given them advice and suggestions. I'm sure they will post when they have early builds to play with. Please note that TenFourFox itself will always be for OS X; this effort would be better understood as part of a PowerPC-enhanced generic Firefox.
So, how are the numbers? My quad G5 doesn't change much on SunSpider (around 1050-1070ms for both TM and JM+TM), but Dromaeo in 9 goes from 114 to 122 runs/sec with JM+TM. The 1GHz G4 in 9 goes from 32 to 36 runs/sec and from 3300 to 2800ms in SunSpider. Overall, while not the dramatic change that tracejit was, it's still progress and I expect even more when we get type inference running; the project goal is still to get the quad G5's SunSpider time under one second and we're almost there. In the meantime I want you to bang on the browser hard, please -- we've got to get all the kinks out!
Some housekeeping. Firefox rapid release is developing a long tail of people not upgrading to later versions for a variety of reasons ranging from ignorance to add-on anxiety to update fatigue, which they have attempted to fix with the 3.6 to 8.0 "major update." The issue of the uninformed and shortly-to-be-marooned PowerPC Firefox 3.6 user notwithstanding, we have the same problem, made more acute by the termination of plugin support with TenFourFox 6. Nevertheless, I think it is bad form to have so many users on an unsupported branch with known security holes and I'm considering updating all the version snippets to remind users that TenFourFox 9 is available for them too and to please update to a supported version. They could of course ignore this nag as well, but at least we tried. Argue passionately for or against this idea in the comments.
One last note: do not downgrade to 9.0b1 with methodjit on -- make sure you turn it off before trying to re-run the beta for comparison, or the browser will immediately assert. In fact, you should probably do that for any previous version of TenFourFox, just to be safe.
On deck for 10 will be Tobias' fixed AltiVec text converters back in the lineup so he won't come over to my house with an axe ;), and some other additional minor optimizations, but type inference is the big kahuna and I may delay the beta until it is ready. As always the Secretary will disavow, I mean, the beta port will start when Mozilla certifies the first 10 beta. And this blog entry will self-destruct in five seconds because of the secret sauce. So you'd better read the release notes and download for your architecture now. Good luck, Jim.
Oh, yeah, the secret sauce. It is generally not my policy to introduce new code in an RC, much to the annoyance of our contributors, and although in a strict sense the new code is also shipping as part of regex compilation (which was in the beta) the "sauce" is turned off as shipped. However, we're also in a bit of a time crunch. Fx10 is likely to have integration problems with tracejit at best and outright break it at worst, and losing tracejit with nothing to replace it specifically means a real punch in our gut.
So we really need to test our new JavaScript PowerPC methodjit now so that we can start working on any bugs and get type inference working (I'll discuss what that means in a moment), preferably for 10.4Fx 10. And that's what you're going to get to do with the "RC," because the RC has methodjit secretly embedded in it! Start the browser and go to about:config, and turn both javascript.options.methodjit.chrome and javascript.options.methodjit.content to true. Do not change any other values related to JavaScript, and do not turn type inference on (the browser will crash immediately if you do). Leave the tracejit on. Restart the browser to clear everything out. The browser will be a little slower to restore itself because it will now start compiling the chrome methods with methodjit as well.
Tracejit and methodjit attack JavaScript just-in-time compilation in two different ways. Our old friend tracejit watches the interpreter run code. If it sees something it determines is "hot," it turns on a tracer that translates JavaScript operations into an intermediate language called LIR and from there into PowerPC assembly language. This is very fast and the fastest way of running certain kinds of code, but branchy code paths cannot be built with tracejit because the execution runs in too many directions, and an unexpected change from the normal execution path means the tracer has to start a new one. So that's where methodjit comes in. If JavaScript determines (through profiling and various internal heuristics) that tracing will be unprofitable on a section of code, it can try simply compiling the entire JavaScript method en masse. This is slower than tracejit at some tasks because it may end up compiling code paths it may never execute (most importantly related to covering the gamut of implied types JavaScript makes possible -- read on), but it also allows certain kinds of optimizations that a tracer will never see because it only sees "one view" of the script rather than the entire bulk of code. The browser will pick which of them is most effective, caching its choice and the JITted code, and you will notice that as more code is in the cache the faster the browser gets.
Tracejit has fundamental differences from methodjit and Mozilla is proceeding in directions that tracejit is not compatible with, which is why it is being disabled in 10 and outright deleted from the tree in 11. This is bad for us, because tracejit executes certain kinds of code far faster than methodjit ever will. Mozilla's solution to methodjit's overhead is to introduce type inference to JavaScript. Remember that I mentioned part of the trouble with methodjit is compiling unnecessary code to cover the possible (or at least reasonable) permutations for JavaScript parameters and variables. If it could be determined what the most likely set of types were for an execution path, then code covering just those types could be built and the rest could fall through to the interpreter for a second chance while the methodjit revises its guess. The guess has to be good, because the penalty is to lose both the code compiled and to have to execute in the interpreter, a penalty tracejit hardly ever has to take because the types are already determined for it by the time the tracer gets enabled (and it doesn't have to junk the code path it already compiled, because it might use it again). This is where Mozilla is putting significant resources to beat Chrome V8 and Crankshaft, and Fx9 is the first Firefox where type inference is being exposed to the user by default.
Type inference makes drastic changes to the JavaScript engine, and as a result, JM+TI (that is, JaegerMonkey/methodjit with type inference) is incompatible with TM (tracejit), even though you can combine JM+TM without TI. But because type inference and other optimizations are part of the future IonMonkey project, tracejit is now old news. This is sad for us because our tracejit is very heavily optimized, but that wasn't my call, and Mozilla was kind enough to give us until Fx11 to get our backend shifted over. However, we don't support type inference yet -- I couldn't get it fully working in time. This is just as well since we should really test the core browser without it first.
So, shipping in this release is the Nitro macroassembler for PowerPC (which is an .h file with certain semi-atomic operations and their assembly-language equivalents), expanded for Mozilla's purposes, and a lower-level assembler that the macroassembler calls to do the translation. These are the pieces Ben diligently worked on the first draft of, and this version includes my completed draft with the bugs worked out. They are not as optimized as they could be, but they do work, and we can delay that a bit to get any other major issues out. Both YARR and methodjit use the macroassembler, YARR for regular expression compilation (which we do have enabled in 9), and methodjit for JavaScript. There is also a snippet of code called a "trampoline" which is a skeleton function prologue, epilogue and stub call library I wrote by hand in PowerPC assembly to interface with the JavaScript C++ interpreter, and the other pieces needed for JavaScript to understand OS X ABI stack frames and register allocation. Those of you watching our stack usage problem will now note that our methodjit stack frames are a fixed 128 bytes, overall smaller than tracejit, and our YARR stack frames are a fixed 32 plus any subpattern usage. We should start seeing smaller stack overhead when tracejit goes away, which is a consolation prize. The interested can see what our stack frames now look like in js/src/methodjit/MethodJIT.h and js/src/methodjit/TrampolinePPCOSX.s.
There is an effort underway to port the TenFourFox PowerPC-specific code to Linux. The VMX code Tobias wrote will translate pretty much directly, but the JIT compilers will not because they assume the OS X application binary interface, not the Linux ABI. Linux ABI uses different stack frame sizes, a different linkage area (the first few words of the stack frame), different volatile register designations and different parameter register allocation. Tracejit was not really written for anything else but OS X, but methodjit is being designed to be friendlier to our PPC brethren on other operating systems like Linux and should be particularly easy to port to AIX (and a tip of the hat to Andrew, who is running server-side JavaScript on big-iron POWER servers like our very own POWER6 here at Floodgap Orbiting HQ). I don't have any ETA on this and I am not personally involved with the project, but I have had contact with them and given them advice and suggestions. I'm sure they will post when they have early builds to play with. Please note that TenFourFox itself will always be for OS X; this effort would be better understood as part of a PowerPC-enhanced generic Firefox.
So, how are the numbers? My quad G5 doesn't change much on SunSpider (around 1050-1070ms for both TM and JM+TM), but Dromaeo in 9 goes from 114 to 122 runs/sec with JM+TM. The 1GHz G4 in 9 goes from 32 to 36 runs/sec and from 3300 to 2800ms in SunSpider. Overall, while not the dramatic change that tracejit was, it's still progress and I expect even more when we get type inference running; the project goal is still to get the quad G5's SunSpider time under one second and we're almost there. In the meantime I want you to bang on the browser hard, please -- we've got to get all the kinks out!
Some housekeeping. Firefox rapid release is developing a long tail of people not upgrading to later versions for a variety of reasons ranging from ignorance to add-on anxiety to update fatigue, which they have attempted to fix with the 3.6 to 8.0 "major update." The issue of the uninformed and shortly-to-be-marooned PowerPC Firefox 3.6 user notwithstanding, we have the same problem, made more acute by the termination of plugin support with TenFourFox 6. Nevertheless, I think it is bad form to have so many users on an unsupported branch with known security holes and I'm considering updating all the version snippets to remind users that TenFourFox 9 is available for them too and to please update to a supported version. They could of course ignore this nag as well, but at least we tried. Argue passionately for or against this idea in the comments.
One last note: do not downgrade to 9.0b1 with methodjit on -- make sure you turn it off before trying to re-run the beta for comparison, or the browser will immediately assert. In fact, you should probably do that for any previous version of TenFourFox, just to be safe.
On deck for 10 will be Tobias' fixed AltiVec text converters back in the lineup so he won't come over to my house with an axe ;), and some other additional minor optimizations, but type inference is the big kahuna and I may delay the beta until it is ready. As always the Secretary will disavow, I mean, the beta port will start when Mozilla certifies the first 10 beta. And this blog entry will self-destruct in five seconds because of the secret sauce. So you'd better read the release notes and download for your architecture now. Good luck, Jim.
Friday, December 16, 2011
Hello, methodjit
bruce:/home/spectre/src/bruce/mozilla-9b/js/src/jit-test/% ./jit_test.py --jitflags=mp,p ../../../obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js
[3544| 0|3544] 100% ===============================================>| 369.4s
PASSED ALL
More about that when RCs come out. mozilla-release is now up to date, so 9 RC builds will commence later today, and if they pass conformance testing, up tonight or tomorrow. (Ignore the elapsed time as this was an unoptimized debug js.)
One other note. Some well-meaning individual submitted the QTE to Softpedia as a download. Please don't do that. It's an alpha, it's not finished, it has known bugs and we're not ready to support it yet. If it was you, please indicate to them that it was an error and retract it until we're ready with a beta. (In fact, you probably shouldn't submit it at all, since when it does hit beta we will just integrate it with TenFourFox itself, probably.) I appreciate your enthusiasm, but it was premature. Thank you :)
[3544| 0|3544] 100% ===============================================>| 369.4s
PASSED ALL
More about that when RCs come out. mozilla-release is now up to date, so 9 RC builds will commence later today, and if they pass conformance testing, up tonight or tomorrow. (Ignore the elapsed time as this was an unoptimized debug js.)
One other note. Some well-meaning individual submitted the QTE to Softpedia as a download. Please don't do that. It's an alpha, it's not finished, it has known bugs and we're not ready to support it yet. If it was you, please indicate to them that it was an error and retract it until we're ready with a beta. (In fact, you probably shouldn't submit it at all, since when it does hit beta we will just integrate it with TenFourFox itself, probably.) I appreciate your enthusiasm, but it was premature. Thank you :)
Tuesday, December 13, 2011
QTE 9 available
The QuickTime Enabler for 9.0 is now available (alpha 114). As the version bump indicates, the only change from a113 to a114 is compatibility with 9. I am really hoping that Mozilla starts issuing simultaneous Add-On SDK compatibility updates with their betas, because it somewhat harshes our buzz to lose the QTE temporarily when we test our own betas. However, the QTE does not seem to cause any problems with stability, and because of this we may simply start including this as a built-in part of TenFourFox (I have not decided yet).
Simon Royal from LowEndMac has been trying to drum up interest in porting one of the open-source Flashes to PPC OS X. This is not a project I can personally donate time to, but if some of you out there are looking for a project the best one to port would be Lightspark (plus-minus Gnash for the older Flash applets), as it contains the most current version of the Adobe ActionScript virtual machine. The code is known to already work on PowerPC Linux, but the code would need to be altered to use either QuickDraw or CoreGraphics, and it would need to be buildable on OS X, so the port is possible but non-trivial. I'll add a carrot here: if someone gets this running and is able to maintain it, I will consider (technical limitations notwithstanding) lifting the plugin embargo. If the plugin exists, the plugin code in TenFourFox still works (we no longer test it, people are on their own) and someone is keeping it up-to-date, I'll not let it wither on the vine from this end. It's not much of a carrot, but hey, if you're just hanging around the house on the weekends now's your chance to be an abandoned platform rockstar.
Mozilla is due to sign off on Firefox 9 tomorrow, and we will do RC builds immediately after that. Tobias has fixes available for our VMX text converters which will appear in the beta immediately following. We also have a fix in tree for the icons problem, which is Mozilla bug 705516, so those of you using the image.mem.decodeondraw tweak mentioned in that bug should set it back to true before you upgrade to the RC or you will use more memory unnecessarily. RCs willl be announced here when they are ready, most likely by Friday or Saturday.
Methodjit is pretty much done, but we still have six tests (though out of over 1700) that fail and the browser is not able to stand up with the current code. I have not decided if I will hold Firefox 10 until methodjit is done -- tracejit is still in 10, just disabled. That buys us time for me to fix the last remaining failures by the end of tracejit in Fx11. Remember, don't try to enable it in Fx9 -- it will almost certainly not work, even though it is "secretly lurking."
Simon Royal from LowEndMac has been trying to drum up interest in porting one of the open-source Flashes to PPC OS X. This is not a project I can personally donate time to, but if some of you out there are looking for a project the best one to port would be Lightspark (plus-minus Gnash for the older Flash applets), as it contains the most current version of the Adobe ActionScript virtual machine. The code is known to already work on PowerPC Linux, but the code would need to be altered to use either QuickDraw or CoreGraphics, and it would need to be buildable on OS X, so the port is possible but non-trivial. I'll add a carrot here: if someone gets this running and is able to maintain it, I will consider (technical limitations notwithstanding) lifting the plugin embargo. If the plugin exists, the plugin code in TenFourFox still works (we no longer test it, people are on their own) and someone is keeping it up-to-date, I'll not let it wither on the vine from this end. It's not much of a carrot, but hey, if you're just hanging around the house on the weekends now's your chance to be an abandoned platform rockstar.
Mozilla is due to sign off on Firefox 9 tomorrow, and we will do RC builds immediately after that. Tobias has fixes available for our VMX text converters which will appear in the beta immediately following. We also have a fix in tree for the icons problem, which is Mozilla bug 705516, so those of you using the image.mem.decodeondraw tweak mentioned in that bug should set it back to true before you upgrade to the RC or you will use more memory unnecessarily. RCs willl be announced here when they are ready, most likely by Friday or Saturday.
Methodjit is pretty much done, but we still have six tests (though out of over 1700) that fail and the browser is not able to stand up with the current code. I have not decided if I will hold Firefox 10 until methodjit is done -- tracejit is still in 10, just disabled. That buys us time for me to fix the last remaining failures by the end of tracejit in Fx11. Remember, don't try to enable it in Fx9 -- it will almost certainly not work, even though it is "secretly lurking."
Thursday, December 1, 2011
9.0b1 available
9b1 is available, corresponding roughly to Firefox 9 beta 4. This was a rather troublesome port -- Mozilla did some large internal changes in this release to facilitate type inference (which we do not yet support, stay tuned) and the architectural requirements needed caused significant delay in simply getting the browser to build. Even after that, in DEBUG mode it worked reasonably well but with VMX/AltiVec code paths enabled it crashed incessantly and I had to disable two of our text converters. I'm hoping we can get them reenabled for either 9 final or 10.
It also seems, based on comments in the previous blog entry, that there are still some sites with ridiculously large or complex numbers of scripts that still cause us to run out of stack. For 9, I've made a bandaid and backed the stack pointer all the way up to 0xf0000000 in memory, for a total of 1GB of stack. (On systems less than 1GB of RAM, it doesn't use this automatically, but it potentially could. In this case, it will get swapped to disk, so your system may thrash like crazy but it won't crash; the stack will then shrink back down after execution completes. For this reason, 1GB is now the recommended minimum, though we will still support 512MB machines.) Even with this "bandaid," this obviously doesn't mean we won't overrun it again in the future, and I'm unable to think of any way I can wring more stack out of the browser in 32-bit mode and 64-bit mode is not an option for any architecture except G5.
Methodjit will change this, but I have no idea if it will change it for the better or not. The key issue is that because of our very large register set and our requirement to abide by the OS X PowerPC ABI, we must save large stack frames to the stack, and our stack cannot dynamically grow unlimited. (This is a good thing -- such sites could destabilize the entire computer rather than just the browser if they were allowed to allocate stack forever.) Methodjit as currently written limits us to a much smaller register set, currently 16 GPRs and 8 FPRs of which all the FPRs and around half of the GPRs must be saved on the stack (plus the usual OS X ABI linkage area overhead). This is a smaller stack frame than tracejit uses, which can potentially use all nearly all the GPRs at once, but it may not save us because it may be easier for methodjit-generated functions to recurse with less checking (and that smaller set of registers in use may have performance impact, but I talked about this with Mozilla and it does not appear likely to change -- methodjit is limited to a total of 32 registers altogether). We are exploring ways to propagate an imminent stack exhaustion state to the browser so that a window like the "Unresponsive script" dialogue can be generated rather than, you know, crashing. If you have some ideas about that, see issue 114.
9 is the first installment of what will become methodjit, if we can get it done. This includes Ben's hard work on the macro assembler plus my additional hacking on YARR JIT to enable us to compile regular expressions, which was a gap we suffered from before. With regular expression compilation, the quad G5 running at full tilt drops its SunSpider numbers from around 1600ms to about 1050ms in 9. Yes, this is slightly slower than 8+YARR, probably due to some additional overhead in the interpreter, but it is still significantly faster than 8 alone. (As a side effect, methodjit compilation is enabled, but it does not work. If you were naughty and turned on any of the methodjit preferences in about:config, turn them off before upgrading to 9.0 or you will crash!) On sites like Twitter, it makes a big difference, and it will also accelerate certain extensions like Ghostery and AdBlock Plus. Think of it as a down payment on more JavaScript wizardry to come.
Even though we had to back out some of the AltiVec text conversion code, 9 also includes Tobias' AltiVec qcms colour management code and the completion of the AltiVec scale and colour conversion code, which improves WebM performance quite a bit. It also corrects a regression with JPEG decoding that Chris found and that Mozilla caused -- now JPEG AltiVec decoding is really really fast, and G3 is much faster than it was in 8.0.
9.0 makes a few small subtle interface changes to things like the go/reload/stop button, for really no good reason, but whatever. There are also additional HTML5 and CSS features, font stretching, and improved AJAX performance with a revamped, "chunked" XHR system.
So let's gaze into the crystal ball for a moment and consider some possibilities. Firefox 10 will be the last version of Firefox to have tracejit; it's gone in 11. Preferentially we will have our implementation of methodjit available for 10 beta and we must have it for 11, or we'll have to keep dragging a tracejit-able JavaScript to future versions, which seriously hurts our portability. We could really only fire a "gun" like that once, maybe twice, before changes in the JS API and other internal limitations make it impossible to merge an old JS into a later browser. The long and short is that we really need to have methodjit if we want to continue with source parity. If not, then we drop to feature parity and go from there.
This gets a little more interesting (and exhausting) in that Mozilla is now talking about dropping support for 10.5 in Firefox 13, which was inevitable. It's probably worth reviewing when Mozilla dropped 10.3 for Fx3 and then, of course, when they infamously dropped 10.4 for Fx4 which is why we exist in the first place. Like I say, it was inevitable for Leopard support to have its number come up so quickly; there are few reasons for Intel users to stay on it (Snow Leopard is faster and leaner and still runs PPC software), and Tiger was a much longer lived version of Mac OS X than Leopard was. It is also worth noting who drives each proposal to end support (hint: same person each time). Just sayin'.
The reason this interests us is that while there were many things in 10.5 that we could emulate in 10.4, there are many things in 10.6 (and, for that matter, 10.7) that we can't. Things like, for example, Grand Central Dispatch, or other hidden UI interfaces, or the graphics stack. Right now we simulate the dependency on CoreUI by using the old chrome code, CoreText with Harfbuzz and NSTrackingArea with cleverer management of events (and there are some 3rd party versions of it we could use if we had to), and 10.5 is limited to software rendering also, which makes things easier for us because we are also limited to software rendering and Mozilla has to support that. But it will be much harder when 10.5 compatibility goes away, because Mozilla will make assumptions about the capabilities of the class of remaining machines which many Power Macs will not meet, and will almost certainly try to leverage some of the new features available now that they don't need to worry about legacy support.
But wait, the plot thickens with a partridge and chickens. Most of us have already said our piece on rapid release (and there is some evidence that it is hurting Mozilla's market share already) and more reasonable elements within the Foundation have determined that having a long-term support branch might be worth it (the so-called Extended Support Release, or ESR). The ESR would be the second-tier release and users would be discouraged from running it, but it would be available for corporate or managed environments where rapid-release makes regular Firefox burdensome to manage. 3.6 is sort of filling that role now, so this would finally end updates to 3.6 and replace it with another long-lived branch, and then periodically update it according to a regular, slower timeframe. This idea is not new; the original ESR plan called to start around Fx8, and now it's talking 10, and in all likelihood I don't see much traction on it until the Fx12 or 13 timeframe.
Wait, did you say Fx13? The ESR is an important release for us: it guarantees the ground doesn't shift under our feet more than necessary and we get a certain degree of security patches. While Mozilla will discourage the ESR for regular users, there is nothing that says we can't build on it, and the ESR will be supported for at least 6 months, probably longer. So here's how the scenarios will play out. If we can hang on and make the jump until Fx13 (assuming that is the ESR basis), then we will be on a stable platform with security updates that we can continue to build on. If we develop past the ESR, and we are unable to maintain it, we could be marooned on a Firefox branch that has no updates (and has the requisite bugs from that release to boot). Nothing says that past the 10.5 cutoff date that we will still be able to keep the browser afloat.
So the current plan is to do whatever it takes to get us to the ESR and then start making official releases based on that, while continuing to generate beta releases in parallel off mozilla-beta as we have been doing. If it looks like we'll make it to the ESR after that, then we keep going. If not, we fall back on the ESR, no harm done, and backport features to that as feature parity. When the ESR ends, we enter security parity. That's still a lot of life left, and did you notice that TenFourFox has already had its one year anniversary? Break out the champagne and cheese, man!
As usual the beta will break the QuickTime Enabler because the Add-On SDK team doesn't have the foresight to get a working SDK to coincide with it in the Add-On Builder. Because of the tardiness of this release, the QTE will simply be a maintenance update, and I will post it when it is available.
Release notes and architectures:
It also seems, based on comments in the previous blog entry, that there are still some sites with ridiculously large or complex numbers of scripts that still cause us to run out of stack. For 9, I've made a bandaid and backed the stack pointer all the way up to 0xf0000000 in memory, for a total of 1GB of stack. (On systems less than 1GB of RAM, it doesn't use this automatically, but it potentially could. In this case, it will get swapped to disk, so your system may thrash like crazy but it won't crash; the stack will then shrink back down after execution completes. For this reason, 1GB is now the recommended minimum, though we will still support 512MB machines.) Even with this "bandaid," this obviously doesn't mean we won't overrun it again in the future, and I'm unable to think of any way I can wring more stack out of the browser in 32-bit mode and 64-bit mode is not an option for any architecture except G5.
Methodjit will change this, but I have no idea if it will change it for the better or not. The key issue is that because of our very large register set and our requirement to abide by the OS X PowerPC ABI, we must save large stack frames to the stack, and our stack cannot dynamically grow unlimited. (This is a good thing -- such sites could destabilize the entire computer rather than just the browser if they were allowed to allocate stack forever.) Methodjit as currently written limits us to a much smaller register set, currently 16 GPRs and 8 FPRs of which all the FPRs and around half of the GPRs must be saved on the stack (plus the usual OS X ABI linkage area overhead). This is a smaller stack frame than tracejit uses, which can potentially use all nearly all the GPRs at once, but it may not save us because it may be easier for methodjit-generated functions to recurse with less checking (and that smaller set of registers in use may have performance impact, but I talked about this with Mozilla and it does not appear likely to change -- methodjit is limited to a total of 32 registers altogether). We are exploring ways to propagate an imminent stack exhaustion state to the browser so that a window like the "Unresponsive script" dialogue can be generated rather than, you know, crashing. If you have some ideas about that, see issue 114.
9 is the first installment of what will become methodjit, if we can get it done. This includes Ben's hard work on the macro assembler plus my additional hacking on YARR JIT to enable us to compile regular expressions, which was a gap we suffered from before. With regular expression compilation, the quad G5 running at full tilt drops its SunSpider numbers from around 1600ms to about 1050ms in 9. Yes, this is slightly slower than 8+YARR, probably due to some additional overhead in the interpreter, but it is still significantly faster than 8 alone. (As a side effect, methodjit compilation is enabled, but it does not work. If you were naughty and turned on any of the methodjit preferences in about:config, turn them off before upgrading to 9.0 or you will crash!) On sites like Twitter, it makes a big difference, and it will also accelerate certain extensions like Ghostery and AdBlock Plus. Think of it as a down payment on more JavaScript wizardry to come.
Even though we had to back out some of the AltiVec text conversion code, 9 also includes Tobias' AltiVec qcms colour management code and the completion of the AltiVec scale and colour conversion code, which improves WebM performance quite a bit. It also corrects a regression with JPEG decoding that Chris found and that Mozilla caused -- now JPEG AltiVec decoding is really really fast, and G3 is much faster than it was in 8.0.
9.0 makes a few small subtle interface changes to things like the go/reload/stop button, for really no good reason, but whatever. There are also additional HTML5 and CSS features, font stretching, and improved AJAX performance with a revamped, "chunked" XHR system.
So let's gaze into the crystal ball for a moment and consider some possibilities. Firefox 10 will be the last version of Firefox to have tracejit; it's gone in 11. Preferentially we will have our implementation of methodjit available for 10 beta and we must have it for 11, or we'll have to keep dragging a tracejit-able JavaScript to future versions, which seriously hurts our portability. We could really only fire a "gun" like that once, maybe twice, before changes in the JS API and other internal limitations make it impossible to merge an old JS into a later browser. The long and short is that we really need to have methodjit if we want to continue with source parity. If not, then we drop to feature parity and go from there.
This gets a little more interesting (and exhausting) in that Mozilla is now talking about dropping support for 10.5 in Firefox 13, which was inevitable. It's probably worth reviewing when Mozilla dropped 10.3 for Fx3 and then, of course, when they infamously dropped 10.4 for Fx4 which is why we exist in the first place. Like I say, it was inevitable for Leopard support to have its number come up so quickly; there are few reasons for Intel users to stay on it (Snow Leopard is faster and leaner and still runs PPC software), and Tiger was a much longer lived version of Mac OS X than Leopard was. It is also worth noting who drives each proposal to end support (hint: same person each time). Just sayin'.
The reason this interests us is that while there were many things in 10.5 that we could emulate in 10.4, there are many things in 10.6 (and, for that matter, 10.7) that we can't. Things like, for example, Grand Central Dispatch, or other hidden UI interfaces, or the graphics stack. Right now we simulate the dependency on CoreUI by using the old chrome code, CoreText with Harfbuzz and NSTrackingArea with cleverer management of events (and there are some 3rd party versions of it we could use if we had to), and 10.5 is limited to software rendering also, which makes things easier for us because we are also limited to software rendering and Mozilla has to support that. But it will be much harder when 10.5 compatibility goes away, because Mozilla will make assumptions about the capabilities of the class of remaining machines which many Power Macs will not meet, and will almost certainly try to leverage some of the new features available now that they don't need to worry about legacy support.
But wait, the plot thickens with a partridge and chickens. Most of us have already said our piece on rapid release (and there is some evidence that it is hurting Mozilla's market share already) and more reasonable elements within the Foundation have determined that having a long-term support branch might be worth it (the so-called Extended Support Release, or ESR). The ESR would be the second-tier release and users would be discouraged from running it, but it would be available for corporate or managed environments where rapid-release makes regular Firefox burdensome to manage. 3.6 is sort of filling that role now, so this would finally end updates to 3.6 and replace it with another long-lived branch, and then periodically update it according to a regular, slower timeframe. This idea is not new; the original ESR plan called to start around Fx8, and now it's talking 10, and in all likelihood I don't see much traction on it until the Fx12 or 13 timeframe.
Wait, did you say Fx13? The ESR is an important release for us: it guarantees the ground doesn't shift under our feet more than necessary and we get a certain degree of security patches. While Mozilla will discourage the ESR for regular users, there is nothing that says we can't build on it, and the ESR will be supported for at least 6 months, probably longer. So here's how the scenarios will play out. If we can hang on and make the jump until Fx13 (assuming that is the ESR basis), then we will be on a stable platform with security updates that we can continue to build on. If we develop past the ESR, and we are unable to maintain it, we could be marooned on a Firefox branch that has no updates (and has the requisite bugs from that release to boot). Nothing says that past the 10.5 cutoff date that we will still be able to keep the browser afloat.
So the current plan is to do whatever it takes to get us to the ESR and then start making official releases based on that, while continuing to generate beta releases in parallel off mozilla-beta as we have been doing. If it looks like we'll make it to the ESR after that, then we keep going. If not, we fall back on the ESR, no harm done, and backport features to that as feature parity. When the ESR ends, we enter security parity. That's still a lot of life left, and did you notice that TenFourFox has already had its one year anniversary? Break out the champagne and cheese, man!
As usual the beta will break the QuickTime Enabler because the Add-On SDK team doesn't have the foresight to get a working SDK to coincide with it in the Add-On Builder. Because of the tardiness of this release, the QTE will simply be a maintenance update, and I will post it when it is available.
Release notes and architectures:
Subscribe to:
Posts (Atom)