So, after months and months and months and months of work (that's almost seven months of work, for those keeping score at home), we are now on PPCBC, the PowerPC-specialized form of BaselineCompiler, and our hardworking methodjit is now released to that great tracing monkey in the sky (though a large part of it lives on in the regular expression library, and some portions are still used by Ion). Was it worth it?
Let's talk about JIT theory. In general, the lower the latency of a just-in-time compiler, the quicker it generates code, but the poorer the quality of the code it generates (no time for significant analysis or optimization). Our first JavaScript JIT, TraceMonkey, was a tracing compiler that had low to medium latency but therefore had low to medium code quality (and because it was a tracing compiler instead of a method compiler, had a tendency to balloon memory usage and get snared compiling code it shouldn't have; much logic was dedicated by Mozilla to avoiding this sort of unnecessary work). Methodjit was a medium-latency compiler. Implemented as JaegerMonkey, it generated code of therefore medium and acceptable quality, but it had a startup penalty which some users complained about during the transition to 10.0. Type inference "JM+TI" improved the code quality, but required even more latency because the interpreter had to run a certain amount of time to generate type information before JM+TI could spit out code optimized to those types (but once it did, the code it generated was pretty efficient, moving the compiler to medium-high latency, but also medium-high quality).
IonMonkey, which we don't yet implement, is a high-latency compiler emitting very optimized code. But its latency comes at a price, particularly on single processors where compilation cannot occur in the background. In fact, Mozilla does not even try to invoke IonMonkey until a particular block of code has run at least 10,000 iterations; it doesn't pay off.
BaselineCompiler (I'll discuss PPCBC in a moment), on the other hand, is a low latency compiler, even lower than TraceMonkey. The browser will attempt to compile code running with as few as 10 iterations (!) in Baseline because there is little penalty to doing so: even though it generates low-quality code, the code that it does generate is over four times faster than the interpreter, and because it generates it so quickly the browser can start executing this code nearly immediately. However, it generates code that is about 60% slower than TraceMonkey, and about 7 times slower than JM+TI.
Because BaselineCompiler cannot make assumptions about the code it runs (methodjit could, because type inference greatly improved its ability to predict at the cost of -- you guessed it -- more latency), it has a dependence on guards to ensure that code that violates its assumptions is properly handled. These guards are an integral part of the inline caches it generates, which are little blobs of code popped out for specific operations as they are run by the JavaScript engine. PowerPC does not do well with branchy code, especially the G5, and because all of our supported CPUs are superscalar we can optimize these commonly emitted type guards with better instruction-level parallelism to reduce branches and improve throughput. That's what PPCBC does, converting these and certain other portions of the inline cache code to PowerPC-optimal straight line sequences, improving our performance on benchmarks by about 15-20% without any penalty to latency. This pulls us to around 40% slower than TraceMonkey, and about 5 times slower than JM+TI.
So by now you should have guessed the tradeoffs, but let's be explicit: virtually all benchmarks suffer. These are long-running sections of code that JM+TI optimized very well, since it had the time to do so. V8 drops on the quad G5 from 2300 to about 450 (but the interpreter clocks in at barely 100). SunSpider time increases by a similar proportion. Because such a large portion of Peacekeeper is predicated on our JavaScript performance, we suffer badly there too. BaselineCompiler also does not utilize the FPU very well, which is really painful on PowerPC because we have no direct ways of converting integer to floating point; benchmarks requiring lots of floating point computation really take it in the shorts, and there is no good way to fix this.
Fortunately, most pages do not have long-running scripts; they have quick-hit scripts, and most of them are using integer or object-based code. This is where PPCBC shines. Pages become significantly more responsive and because we jump into compiled code with a very short delay, there is much less wait. Many, though by no means most, sites fall into this category. YouTube is a site that could go either way, but eBay does very nicely. Gmail feels about the same, but at least it does not regress.
The definitive solution is to implement IonMonkey fully, of course. When fully operational, then after a period of time running, PPCBC-generated code will have accumulated enough type information to allow IonMonkey to emit very nicely optimized sequences, better than JM+TI would have generated for the same input. The good news is that implementing PPCBC first gets us about 2/3rds of the way to Ion since they use most of the same underlying machinery, and it is a predictably performing compiler which is important for our low end systems. (By the way, do not try to enable Ion in the browser. It will crash. You may need to restart it in safe mode to turn this off, so please don't. If you are using the js shell, be sure to start it with the --no-ion option.)
TenFourFox 24 does have better graphics support and improved DOM performance which helps to offset some of this performance loss. We are also using different widget code required by the Australis upgrade, which is improving some of our chrome drawing speed (more about Australis in a moment). I did attempt a build with jemalloc in it, the higher performance allocator that Firefox preferentially uses and that we did attempt an unsuccessful test build with back for the 22.1 release. We scotched it back then for being unable to deal with a memory leak, and jemalloc makes 24 even worse: overnight it ballooned to almost a gigabyte of memory on my quad G5. In addition, the performance delta between regular and jemalloc is much smaller for 24 due to improvements in the core and it only makes a small difference on a subset of sites. So it's not worth the headache now.
The only outstanding bug of significance so far in 24 is a problem with Personas covering up the "traffic light" buttons on redraw (issue 247). It's cosmetic; they repaint when you hover them, and they work normally, so it's just an ugliness that needs to be polished up. This will be fixed for the final release and does not occur with the regular chrome. YMMV, do report as you find them.
Localizers should consider strings frozen for this release, so language packs for 24 can now be created. I am thinking we will have one more beta (24.0.1) to coincide with 17.0.10, and then 24 will replace everything for 24.0.2; langpacks should be ready to go by then. I'll let Chris Trusch comment on the feasibility of that timeframe. Our long-suffering and greatly valued volunteer translators should look for activity in issue 42.
Looking ahead to the future, I am not likely to land Ion on 24 if we can get at least Fx26 running. The reason is simply because I don't know how our systems will deal with it; it's a heavyweight compiler, and it may be too much to be efficient on a G3. We might even only ship it for 7450 and G5, and let G3 and 7400 use Baseline only, which may perform more smoothly on those significantly older machines. However, because PPCBC works fine, Ion is now officially a "solveable" problem given enough time. Evaluating its responsiveness will thus be a big part of the upcoming new unstable branch releases.
What isn't necessarily a solved issue, though, is Australis, the new interface. Some of this code is already in 24, invisibly, and we use some of it for Personas (so fixing the Personas bug is important not because it's cosmetically wacky, but because it's a useful test of code to be used more heavily in a future browser version). However, it still has lots of performance regressions and bugs and it's not even a part of Nightly Firefox builds, just the UX branch; it is now debatable it even makes Firefox 27. Whenever it lands, we need to get Australis working to advance, since almost all of the browser chrome will depend on it; the odds are good as long as 10.6 support doesn't get dropped given our success thus far, but by no means guaranteed.
Anyway, I am relieved that 24 is not an utter disaster. Let me know what you think. I will start working on 26 beta in the very near future as well to kick off our assault on the next ESR, the far-away ESR 31.
Switching back and forth between 22 and 24 to compare, I see that
ReplyDelete1) 24 takes longer at startup
2) Facebook pages in 24 load at least as fast as in 22, sometimes much faster. Improvised wallclock timing says up to 50% better. The blue JS progress spinner looks jerkier, but the page itself loads quicker. Esp. if you scroll down and new stuff is loaded dynamically. Good.
3) also Ebay, Amazon and other JS heavy sites load rather swiftly.
The overall feeling of 24 in the real world (other than benchmarks obviously) is definitely usable, if not even an improvement. Thanks Cameron for all the hard work, it's paying off already I think, and should improve more if IonMonkey is fully working. [PowerBook G4 1.33 GHz, 10.5]
For localizing, I hope that our contributers for Asturian, Italian and Polish are still onboard. The timeframe with one more beta looks okay to me. I'll attempt to make a German test version for 24.0b1 tonight or tomorrow.
I need a few new phrases to be translated to make langpacks for Finnish, French, Russian, Spanish and Swedish (even though I wouldn't complain if some of those are taken over completely by other volunteers), please see Issue 42.
Yes, startup in 24 is about twice as long. Mozilla has moved a lot of stuff into the startup sequence for caching reasons, and since we defeat its ability to multithread (which might be slower on most of our machines) it holds up the main thread longer than 22 did. There's not a lot I can do about that right now, unfortunately.
DeleteThe blue throbber is jerkier and this was, AIUI, intentional so that CPU cycles weren't wasted on updating it instead of working on the page. I think that's a good tradeoff. I'm glad to hear that the overall snappiness of the browser is better as well.
We'll proceed with this timeframe then to allow localizers to catch up and hopefully we'll have a nearly full set for 24.0.2 "final."
Startup time isn't that much of an issue after all because the browser seems very stable, no crashes or oddities so far that would necessitate a restart. Also html5 playback is okay, and RAM usage seems normal.
ReplyDeleteThe only thing I've noticed (but with older versions as well) is that as soon as the browser uses more than 500 MB (or so) of RAM, it'll get noticiably slower executing JS, even if no swapping to the HD occurs. I guess our PPC RAM isn't really up to the read/write speed of modern DDR2 or 3.
The net effect of the JavaScript changes and graphics compositing improvements under the hood is that YouTube WebM, for example, is now decent on my 1GHz iMac G4. Not totally fluid, but certainly watchable, and certainly better.
DeleteThe memory threshold you notice might just be a coincidental point at which stuff no longer fits well into CPU cache or some such. As you say, I don't know what I can do about that right now, and it may very well be an intrinsic limitation.
On the other hand, opening ~20 pictures from a Facebook photo album all in single tabs will effectively make TFF 22 unusable for the next minute. 24 just loads the stuff and the UI remains responsive, meanwhile you can open more tabs with simpler pages which will load completely as if there was nothing going on in the background. I'm not sure if all of this is a result of PPCBC, there may be other optimizations. My PowerBook suddenly feels like a multicore machine. Exciting.
DeleteThat's an optimization in 24 where the browser becomes smarter about decoding and holding images. 26 won't even try to decode images that aren't visible yet, so more improvements in this regard are on the way.
DeleteSounds like Issue 72 (Mozilla bug 641597), however I'm completely unable to trigger it.
ReplyDeleteRegarding the Tenderapp report about CPU usage on G3s I can say that TFF 24 uses exactly 0.00 to 0.20 % of the my Pismo's CPU with one empty window or no window open.
ReplyDeleteGood -- I don't have a G3 testing system up a/t/m (hopefully will convert my Wally over), but I didn't find any such problems on the 7400 systems.
DeleteI'm sure you're aware that our friend in Japan released a 24.0.1b of Tenfourbird. I perceive good jump in performance over 17.0.10, but I quickly noticed a menu issue that makes the new version fairly painful to use. When I first start Tfb and open a new window to compose a message, the pulldown menus work normally. If I open a second new message window and focus is in this window, all options in the Edit, View, Options, Tools and Window menus are grayed out. The Tenfourbird and Help menus continue to work normally. If I change focus to the mailbox window, the menus with the grayed out options return to normal. If I shift focus back to the new message window, the options are grayed out again. The only solution I've found is to quit and restart Tfb, but the problem returns when I open a second new message window. At least this menu issue seems to be easily reproducible. ;)
ReplyDeleteNo, I was not aware and thanks for the heads-up. I'm happy to see it; I was worried he might not be able to make the jump. I'll have to give it a try.
DeleteHowever, I'm not sure what to make of your report. I don't have any reports of that and certainly haven't experienced it myself in TenFourFox. It's definitely not issue 248; that one is related to losing clicks over the menubar and doesn't grey out or disable any options in that sense.
Right. Having experienced something similar to issue 248 in Tfb, I realize this is a different beast. The only add-on I had enabled was the latest version of ImportExportTools. The problem persisted after removing IET. Other than this issue, Tfb 24.0.1b feels like a very nice improvement over version 17.0.10. I'm encouraged.
DeleteThe developer discovered that opening the toolbar customization dialog resolves the issue, but is not sure why. I thought the fix would likely be temporary, but I've been unable to reproduce the grayed out menus since opening and closing the dialog, even after quitting and restarting Tenfourbird.
DeleteIt turns out that the toolbar customization trick was temporary, but our friend has gained a better understanding of the issue and thinks he has a workaround for it that he plans to include in Tfb 24.1.0.
Delete24.1.0 appears to be running smoothly. Interaction good with Disqus such as on NPR and Al Jaz. Unable to close all tabs, current page tab open at all times (when only a single page is open), "x" in tab to close not present. No add-ons. You-tube operational with HTML5 and default. No menu problems.
ReplyDeleteHardware Overview:
Machine Name: PowerBook G4 15"
Machine Model: PowerBook5,6
CPU Type: PowerPC G4 (1.2)
Number Of CPUs: 1
CPU Speed: 1.5 GHz
L2 Cache (per CPU): 512 KB
Memory: 2 GB
Bus Speed: 167 MHz
Boot ROM Version: 4.9.1f1
Serial Number: W851855DRG3
24.1.0 with plugins enabled does not recognize installed plugins available to v17. BBC video will play in v17, not in v24. Opening addon manager under v24 don't display available plugins.
ReplyDeleteHardware Overview:
Machine Name: PowerBook G4 15"
Machine Model: PowerBook5,6
CPU Type: PowerPC G4 (1.2)
Number Of CPUs: 1
CPU Speed: 1.5 GHz
L2 Cache (per CPU): 512 KB
Memory: 2 GB
Bus Speed: 167 MHz
Boot ROM Version: 4.9.1f1
Serial Number: W851855DRG3
There is no more plugin support under 24, not even if you enable the undocumented (and unsupported) plugin mode; Mozilla removed QuickDraw and Carbon event model plugin support, and there are too many bugs already in it to warrant trying to put it back. I warned about this ages and ages ago from version 6 when plugins were officially disabled and made unsupported. Sorry, this isn't going to change. I don't know what OS you use, but SeaMonkey supposedly has some plugin support added back into it on 10.5.8. I don't use it, so I don't know how well it works. Leopard-Webkit may also help you here.
DeleteAs far as the changes to the tabs, that's an intentional one Firefox chose to make, and in general we mirror that.
Thanks for suggestions. I'll stick with v17 then as it still works. Odd then v24 provides the ability to enable plugins in he Manager. Agree, Plugins have made for a "crashing" experience in my past. That said, having the ability to simply run shockwave allows the full use of many sites. Too bad TFF/Mozilla has somewhat left me behind by this choice.
DeleteYour reply and work is appreciated. Thank-you.
You can't actually enable them; it displays an empty list to "make the point." This will be removed at some unspecified point in the future.
DeleteKnowing some inside information about 17, though, you would be better served using another supported browser than remaining on the old version and I'll leave it at that.