Wednesday, December 29, 2010

Got it == flaunt it

It's December 29, and I'm still getting Christmas gifts! Well, I had to brag a little ...

Mozilla sent out their Firefox 4 Beta Team T-shirts today, with a thank-you card that says, "thanks for all your hard work! You rock!" Now, maybe that was for OverbiteFF, or Classilla, or being a pest on Bugzilla, and not your favourite and my favourite POWER ISA legacy Mac browser, but hey! free T-shirt! :-D

The nanojit has been up and running for a few days on my personal dev build and is pretty stable, but the results are currently mixed, and I'm going to do a full post on that when I have some hard numbers to present. The long and short of it is, for many tasks the nanojit is a screamer, but for certain tasks it is actually slower (because remember, the trace and emit step has overhead of its own, and any savings the JIT makes seem to be outweighed by that overhead under certain conditions). The argument is which benchmark to believe, but I'm still working on optimizations to the nanojit, it still seems to be an overall win, and even modulo my author bias I wouldn't go back to TenFourFox without it. Still, it's instructive to note how much optimization we take for granted out of our compilers and how much work we have to do to hand-optimize assembly, and hand-optimizing PowerPC assembly language is in fact exactly what I'm doing. Once I get to a stopping point, we'll observe what Mozilla does with beta 9 while I push Classilla 9.2.2 out the door since we're already tardy on its deadline.

Tuesday, December 21, 2010

Enter the nanojit

One of Mozilla's marquee features in Firefox 4 is ever-faster JavaScript. Of course, the real speed gains are only to be had on their officially supported architectures, which now only includes x86, x86_64 and ARM, because of modern JavaScript interpreters' heavy dependence on just-in-time compilation. (See also Google's work on this for Chrome, which naturally is also Intel and ARM ISAs only.) Since PowerPC is the old and busted and x86 is the old yet new hotness, x86 gets the goodies. This doesn't mean that the pure-C++ version of Fx 4's JavaScript interpreter is a slouch (and Fx 4 is faster than Fx 3.6, plus our own compiler juicing, of course), and there is JIT support in Firefox for other architectures such as SPARC even though Mozilla does not consider SPARC a tier 1 platform. Which brings us to PPC JIT.

Some terminology: the pure C (and now C++) JavaScript interpreter is properly called SpiderMonkey and descends directly from Brendan Eich's work when he was at Netscape. All Mozilla derivatives use it; Classilla uses the last C version which was in Firefox 3.0.19, and so does Camino, while SeaMonkey and Firefox use the C++ version which was introduced in 3.5 with TraceMonkey. TraceMonkey's key addition to SpiderMonkey is a clever piece of code called the nanojit, which watches execution through code paths and remembers the "trace" (hence the name). That trace is in fact generated as direct machine code, and run as such, so it is very fast. However, the nanojit is unable to effectively optimize certain types of code, notably code with evals in it and code with many type combinations, and it must run the program to a certain extent to determine what code is "hot" (so very branchy code causes the tracing JIT to give up, because it would use too much memory and CPU time to figure out where the hotness lays). To handle those cases, Mozilla added J├ĄgerMonkey to Firefox 4.0, which is a true method JIT, borrowing the assembler that powers it (Nitro) from the enemy WebKit. Naturally, it is also limited to the currently supported architectures.

So that's our year in review; now for the diplomatic cables. If you look in the nanojit, you see tons of files for lots of architectures, including x86 (of course), ARM, SPARC, even MIPS, and of course PPC. Great, you say, the nanojit in Firefox seems to generate PowerPC code. Then you look at the configuration scripts and your heart sinks:

PowerPC nanojit doesn't get compiled or included in Firefox 3.5 or 3.6.

But wait, I hear you cry! The code is there, right? What's NativePPC.cpp for? Well, it's truly part of the nanojit, but it is incomplete. While it works as part of Tamarin, which is Adobe's implementation of ActionScript (and is indeed part of PPC Flash Player), it does not support enough instructions to work as part of Firefox or in fact any Gecko browser, and this is why your red-hot quad 2.5GHz G5 still benches slower on SunSpider than a measly Pentium 4: the P4 gets native code, we don't.

Well, that's about to change. With big thanks to Edwin Smith at Adobe who put up with my constant queries, I'm proud to announce that we're making progress on doing the obvious: expanding the PowerPC nanojit so it can be used as part of TenFourFox. Ed helped me out with some of the fiddly aspects, and we're now full speed ahead on implementing the LIR instructions needed to fully support the length and breadth of JavaScript that the Mozilla framework requires. Already it is passing most of the LIR assembler tests, and when it passes all of the supported ones, I'll start work on integrating it with the main browser. At the end of this, we'll pass our work back to Mozilla so it can be a part of Firefox, if they want it.

When will you get to play with the new hotness, and how much faster will it get? Conservatively, you can expect up to 2-3x improvement on many JavaScript benchmarks, possibly more. This is still not enough to reach x86 parity because we don't implement the method JIT (yet: more on that later when that work actually commences, if it's possible -- TraceMonkey depends heavily on Nitro, so we'll have to see how feasible such a port would actually be). However, it is still dramatically faster and wrings that much more performance out of our beloved machines. More to the point, it will work on any supported TenFourFox processor architecture.

However, when you'll get it is another matter. While this might be ready for beta 9 or 10, such a massive sea change requires heavy testing and may have subtle bugs, so it may sit 4.0 out and appear in 4.1, or it may appear in a later beta. I'll post more about this later when I have enough testing data to decide, but I've got enough working code already that I couldn't resist a little tease for y'all. Great things are afoot!

Friday, December 17, 2010

Changesets available

Just a brief note that changesets are now available for beta 8, along with up-to-date building instructions, for those interested in hacking on 10.4Fx. Don't let the file size intimidate you; much of that is binary content from the rebranding, although there are some additional bustage fixes that were landed after beta 7.

I am currently reviewing the beta 9 blockers to see if we will release a beta 9, or wait for beta 10 -- each release is about 64MB off our Google Code quota, so I'd rather not blow archival space on a beta release we might not get much benefit from. I'll post a decision here when I've committed one way or the other.

Wednesday, December 15, 2010

Beta 8 now available

Beta 8 is now available! Go get it! Like before, we release slightly ahead of Mozilla, so it will still appear as "b8pre" in the version string.

Mozilla has made some interesting changes to beta 8, the most obvious being the redesigned tabs (with a nice rounded edge) and the new tab-modal dialogue boxes. In beta 7 (and any prior version of Mozilla), that link would give you a drop-down dialogue; in beta 8 ... well, just try it and see. The UI improvement is obvious in that a JavaScript application can't lock up your session by throwing lots of dialogue boxes at you, but there are still some questions about performance and user interaction, so you can expect this to continue to evolve. There are also some under-the-hood changes to JavaScript that should improve stability and improvements to fonts.

Specific to 10.4Fx, this version now offers four separate builds: G3, G4/7400 and G4/7450, and G5. The G5 build is the most unusual since the source is very unstable built for 64-bit PowerPC, so simply -mcpu=G5 or -fast will not work. Instead, this build actually builds for G4/7450, and simply turns on the extra G5 instructions, while including -mtune=G5 which does work (effectively making it 32-bit PowerPC with the PPC 970 instructions bolted on, tuned for the G5's instruction dispatch). Together, these tweaks squeeze out about 15% better performance on a G5 than the old unified 7450 version and is steady as a rock. There have also been performance improvements to the G4/7450 version itself, which is now offered specifically for G4e systems.

10.4Fx also adds support for font tables, enabling many of Firefox 4's marquee font capabilities; enables update checking (downloads are still manual, but you now get background notifications when new versions are ready); makes some UI adjustments and fixes all the known 10.4Fx-specific crash bugs. Note that there are still some Mozilla-generic crash bugs out there and this version is still susceptible to them. That said, I've been using beta 8 as my sole browser on my quad G5 and my iBook G4/1.33 for the last couple weeks and it hasn't crashed a single time on either computer.

Have at it!

Thursday, December 2, 2010

Beta 8 is go

Mozilla is planning to release beta 8 sometime next week and so are we. I pulled source from mozilla-central tonight and made the small bustage fixes needed to continue compilation, and this post is being made from beta 8 in debug mode as we speak. There are still a number of blockers that Mozilla must resolve, but I expect that many of them will be deferred to beta 9, especially the Mozilla Sync-related issues. Most of the critical API and stability fixes are in, though, and they look good.

So far in Firefox beta 8 performance continues to get better, there are some interface improvements, and a number of the glitches in the renderer are gone. On our side of the bug fence, update notification appears to work and all of the known crash bugs are stamped out. We're also planning to deploy a G4/7400 version at the request of those users, and I have a few ideas for a G5-specific version. A G5-specific release was always planned, but the Mozilla source is terribly unstable even with -mcpu=G5, let alone -fast, and I couldn't sort this out in time for beta 7. However, I may be able to use ISA-independent tuning parameters to squeeze more oomph out of it, which is useful to me personally typing here on this quad G5. No promises yet but I am planning to do some tests this weekend to see if such a build will be stable.

One noticeable problem which comes late in the game is that unaccelerated graphics performance, which 10.4Fx is intentionally limited to, is slower in Firefox 4 due to a performance regression in the Cairo backend. We really need this, and Mozilla has marked it blocking final as it affects all platforms. The problem is most acute with animated images; you can track this bug yourself in Bugzilla. Hopefully it gets fixed sooner than later. We're monitoring it.

10.4Fx beta 8 also introduces our own infrastructure and livery, just like Classilla. You'll like it.

Watch this space!