Tuesday, December 27, 2011

AWOAFY? (Are We Old And Fast Yet?, or, 9.0.1pre JM+TI available)

During the long death march to Firefox 4, Mozilla made much hay out of their Are We Fast Yet? site, a/k/a AWFY. AWFY was their daily dangling carrot while they worked on the first iteration of methodjit and offered regular feedback against the competition, and is still being used in the new type inference age. Well, JM+TI is here for PowerPC at last (that is, JaegerMonkey and type inference), and so is 9.0.1pre with new backends for your experimental pleasure. There are no other changes in this release and it is otherwise exactly the same as 9.0; it should fix all the known flaws in methodjit. (It does not include the fix Mozilla issued in Firefox 9.0.1 because I have received no relevant reports of crashes related to it, and it may impact performance, so there will not be a 9.0.1 unless I hear differently.)

So, with this release, I am now proud to bring you Are We Old And Fast Yet? where the first answer is yes and the second answer is "not enough." JM+TI now offers JavaScript speeds comparable to our old friend tracejit: on the quad G5, JM+TI (without tracejit) now matches the 1050ms on SunSpider that tracejit gets, and increases Dromaeo to 132 runs/sec. On our testing 1GHz G4/7450, Dromaeo rises to 38 runs/sec, and SunSpider drops slightly to 2750. (We're solving the missing square root problem for the present time by simply falling through to the JavaScript square root library routine. The G5 uses its native square root instruction, of course.)

This is already pretty darn good, but we can do better, and here is our goal for AWOAFY: quad G5 at Highest achieves less than 1000ms on SunSpider; 1GHz G4/7450 achieves less than 2500ms. These are totally doable; in fact, improvements in Firefox due to ship in the Fx11 timeframe may push these numbers down to target without us doing anything, but we're still going to shoot for low-hanging fruit optimizations in TenFourFox 10.

That brings us to what happens next. If Firefox 10 is indeed the ESR, as our Magic 8-ball predicts (it also predicted I would get a date with Scarlett Johansson, though), there are enough pieces of tracejit that it should still work and I think we should stick with tracejit as the default JIT for the duration of the 10 branch if so (with the option for JM+TI baked in for the adventurous): it's well tested, it's safe and it does well on our older hardware. I am waiting for beta 2 to be certified on mozilla-beta and once it is, we will begin the port. On deck for this release are some fixes from Ben and getting Tobias' fixed VMX text conversion routines back in action.

If Firefox 10 is not the ESR, then Fx10 will be the last tracejit release, and people will be forced to use JM+TI starting with Fx11. Sorry, this is Mozilla's call, not mine.

Either way, when we scramble to the ESR, the ESR will become our stable branch. Mozilla will backport security and stability fixes to the ESR which we will pick up, and unlike Mozilla, we will do other kinds of bugfixes to TenFourFox "ESR" that we deem appropriate. There will be no new features added, however. Most users will stay on this release.

For you, the rowdy, wild beta testers, there will still be beta releases, but not formal releases (because there won't be formal releases for awhile -- users will be kept on the ESR-based release). The situation we are trying to avoid is getting stuck on a release of Firefox that isn't getting security updates because we are unable to get the next one working. If, say, we get to Firefox 14 but can't make it to 15 -- and this is a real risk because Mozilla is already making noise about canning 10.5 support somewhere around Fx13, as we have previously discussed -- then we are marooning our users on a branch that will never get any fixes. By keeping later updates merely betas, most users are on a maintained branch and those of us out on the beta channel can just downgrade to the already existing release and live out our days in secure geriatric bliss in feature parity.

Assuming the ESR is 10, the next ESR would be Fx17. If we make it to Fx16, then we will do a regular beta release for Fx16, release Fx16, and go for 17. We can certainly backport security fixes from Fx17 to Fx16 if we fail at Fx17 and stay safe; the code is similar enough. If we don't make it to Fx16, then we can go on to feature parity against the ESR using the patches we've already done for later releases. So I feel pretty good about the future.

JM+TI is only transitional, however, because the future is Mozilla's new IonMonkey engine. This is months away from being functional, but I already looked at some of the code for the existing partially-functional backends and while much of our JM+TI work will transfer, there is a lot of work to do to get it working for PowerPC. Let's try to get as much wear out of the work we've already done.

Now, let's test AWOAFY. Download 9.0.1pre and go to about:config. Turn javascript.options.methodjit.chrome and .content to true, and turn javascript.options.tracejit.chrome and .content to false (yes, you are disabling tracejit; it is not compatible with type inference). Finally, set javascript.options.typeinference to true and restart the browser. Report your results! Remember to turn type inference off if you need to downgrade to 9.0, and turn methodjit off entirely if you downgrade to 9.0b1 or earlier.

30 comments:

  1. Which Sunspider version do you use? 0.9. or 0.9.1? I'm getting slightly different results each.

    ReplyDelete
  2. SunSpider results for a 1.0 ghz G4 with Tenfourfox 9.0.1pre 7450 build:

    3192.7ms +/- 0.8% for Tracejit
    2794.3ms +/- 1.0% for Methodjit + Tracejit
    2617.2ms +/- 0.9% for Methodjit + Type Inference

    Here are the detailed results:

    http://pastehtml.com/view/bitt2jifm.txt

    ReplyDelete
  3. Kraken drops from 121,179.2ms (MJ only) to 43,632.9ms (MJ+type inference) (G4 1.33 GHz). Christmas miracle? Or does this just reflect the fact that Methodjit is "incomplete" withour type inference?

    More numbers to come.

    ReplyDelete
  4. Probably the latter. Kraken has a number of benchmarks that come originally from statically typed languages and were either machine-adapted for or rewritten in JavaScript. Since their algorithms inherently have a limited number of types in play, TI probably disproportionately helps them. The only one we take a beating on is gaussian-blur; tracejit ran that about twice as fast.

    ReplyDelete
  5. Playing my role as naive beta tester, it passes the Twitter and Facebook tests. Also noticeable improvement in launch time. What is going on with the blizzard of tabs asking to enable each and every f-ing addon that I'd already carefully selected? I only upgraded from 9.0 to 9.0.1pre.

    But you have done a good job, from what I see. Thank you.

    ReplyDelete
  6. MASSIVE improvement over previous v9 with methodjit. Actually better than v8 final for many complex css blogs (like Huffingtonpost). No bugs so far! Also, addon support is comparable to 7 now.

    Artphotodude
    Testing on Powermac G4, Dual-533 (7400), OSX 10.4.11.

    ReplyDelete
  7. >What is going on with the blizzard of tabs asking to enable each and every f-ing addon that I'd already carefully selected?

    This seems to be a FF bug. I got it twice during the last months when I updated more than one add-on at the same time and didn't restart in-between. I can't reproduce it, though.

    ReplyDelete
  8. Also just got to check on my sister's G3 400 iMac and the new version runs well (not faster really, but with fewer Javascript errors than version 8 final).

    ReplyDelete
  9. @chtrusch, seen it for a while, never on the FF-intel boxes though. I close all tabs & go to add ons mgr & re-enable them, then restart.

    ReplyDelete
  10. TFF 9.0.1pre, PowerBook G4 1.33 GHz, 2 GB RAM, 10.5.8

    Peacekeeper (new version)
    TJ only: 267, 266, 268
    MJ only: 254, 254, 257
    MJ+ti: 254, 252, 260
    TJ+MJ: 267, 272, 268

    SunSpider (0.9.1)
    TJ only: 2432.3ms
    MJ only: 2430.2ms
    MJ+ti: 2153.1ms
    TJ+MJ: 2149.3ms

    Kraken 1.1
    TJ only: 61,794.0ms
    MJ only: 121,179.2ms
    MJ+ti: 43,632.9ms
    TJ+MJ: 62,252.7ms

    I'll keep MJ+ti enabled for now.

    ReplyDelete
  11. I'm getting slower numbers with methodjit/type inference in SunSpider.

    TFF 9.0.1pre, PowerMac G5 Dual 2.0 GHz, 4 GB RAM, 10.5.8
    SunSpider (0.9.1)
    TJ only: 1333ms (was 1339ms in TFF 9.0)
    MJ+ti: 1395ms

    ReplyDelete
  12. The Pismo (400 MHz G3) has a full GB of memory now. And a new web browser that drops Sunspider numbers from 6526.0ms (Tracejit) to 5581.5ms (Tracejit+Methodjit) to 5416.7ms (Methodjit+type inference).

    ReplyDelete
  13. Some more settings for +power -eye candy users to make TFF even faster:

    browser.fullscreen.animateUp = 0 (prevents the stupid and sometimes very slow animation when switching to full screen)

    [alternatively: browser.fullscreen.autohide=false (leaves the navigation bar visible always)]

    browser.preferences.animateFadeIn = false (re-sizing is ok, but fade-in of content is non-standard on OS X anyway)

    browser.tabs.animate = false (no more "Assertion Failed. Giving up waiting for the tab closing animation to finish")

    browser.panorama.animate_zoom = false (instant switch from tab groups to normal window)

    Animations may look cool, but on slow computers, they're just choppy if the machine has some actual work to do. Maybe someone can calculate how much lifetime we waste waiting for animations. It accumulates…

    ReplyDelete
  14. Glad to see a workable 9-7450 beta in action, no major improvements and a noticeable performance hit mean I will of course be sticking with 8 but very curious to test the 10 beta and determine if it's unwittingly become G5Fox

    In any case TFF 8 is a world class browser for G3/G4

    ReplyDelete
  15. The chart on TenFourFox page needs to be updated for JM+TI performance. ;)

    ReplyDelete
  16. A suggestion for the QTE add-on:

    If I want to play Youtube videos on my Pismo 400 MHz G3, I have a problem. HTML 5 video in the browser is definitely too slow. Using QTE and opening the standard definition version in QuickTime Player is a little better, but QT Player is still not efficient enough to play the 480x360 resolution without stuttering, or the computer is just plain not fast enough.

    Now, when I use the flash plugin (duck and cover), there's a 240p option on Youtube that doesn't seem to be available as html5/h.264 (which is 360p and up). The 240p flash version plays not quite fluently in the browser on the 400 MHz G3, but it's the best option so far. Using Mactubes (240p with the flash player option), it's *almost* fluent (if you close all other applicatons and reserve all resources for MacTubes). When I download this .flv 240p version manually, it plays perfectly in QT player (Perian) even with TFF, Photoshop, iTunes etc. open in the background.

    So: Might there be a chance to have an option to just "Open Youtube Movie in QuickTime (flv240p)"? Or is this impossible because it isn't part of Youtube's html5 'department'?

    ReplyDelete
  17. @zubr, yes :P I'm stalling until I'm done with some initial optimizations.

    @chtrusch, I'll have to see if the 240p version is part of the metadata. I don't know if it is, but it's easy enough to add it if it's there.

    ReplyDelete
  18. Or, install flashvideoreplacer, control click on the icon at the far right of the address bar. Select copy url to clipboard, select 240 (flv), open quicktime (or VLC, or niceplayer, or....) command u, command v, hit return, and it should stream nicely.

    Also Wegener media is still selling 550 mhz G4 upgrades for Pismos, and with a little altivec you should be able to play 240 flv in the browser, 360p using the above method. In non tenfourfox news I recently installed greasekit for Safari (and omniweb) Viewtube also gives you 240 flv option, using quicktime inside the browser. Works well, but cpu use is near 100%.

    ReplyDelete
  19. To me it seems that with methodjit and type inference instead of tracejit the memory footprint is much smaller - which is particularly useful especially for machines that cannot be upgraded to at least 2 GB. Even the maximum of 1.25 GB RAM in this PB G4 12" isn't really enough for running TFF with tracejit when using multiple tabs or windows.

    ReplyDelete
  20. I made some bookmarklets to help with youtube vids. This one swaps out flash for your video plugin. It's for 360p:

    javascript:void(_gel('watch-player').innerHTML='')

    For 240p, the quicktime plugin might not behave properly, but you can try:

    javascript:void(_gel('watch-player').innerHTML='')

    To download 360p:

    javascript:window.location=unescape(yt.getConfig('PLAYER_CONFIG')['args']['url_encoded_fmt_stream_map'].match(/.*g=(34|43),url=(.+?)&q/)[2])

    To download 240p:

    javascript:window.location=unescape(yt.getConfig('PLAYER_CONFIG')['args']['url_encoded_fmt_stream_map'].match(/.*,url=(.+?)&q/)[1])

    ReplyDelete
  21. Blogspot.com "sanitized" my post. The first two bookmarklets were cut off. Here they are in full:

    http://pastehtml.com/view/bjfpphucy.txt

    Sorry about that.

    ReplyDelete
  22. @Johnson, Pardon my profound ignorance, but exactly how do you install these bookmarklets?

    ReplyDelete
  23. Add them as the URLs for bookmarks.

    Browse to a youtube vid. Once there, click on the bookmark to run the javascript.

    If anyone is interested, I can try to explain the code piece by piece and how to customize it. You can make the video size larger or smaller for example.

    ReplyDelete
  24. @Tobias, JM+TI is definitely more memory efficient because an entire trace has to be created for each combination of types, whereas JM+TI can distill down to a much smaller set.

    I'm wondering if the 240p version is VP6 only. Johnson's bookmarklet implies there is no special code for it, just the default resolution.

    ReplyDelete
  25. Thanks for all your help and suggestions with YouTube. A G4 upgrade for the Pismo would be nice, but is a bit expensive. I'm a big fan of bookmarklets. The 240p swap doesn't work with the QT plugin, but the other ones work just great. The one-click-solution for 240p download is a big time-saver on the Pismo. Downloading files with add-ons or MacTubes is a bit cumbersome, and the convenient downloading sites (like keepvid.com) all require the Java plugin, which is now largely dysfunctional with TFF 9 on 10.4 (but still works on 10.5 and in Safari on both 10.4 and 10.5).

    With respect to the memory footprint: I haven't seen 9.0.1pre with type inference go higher than 460 MB of real memory yet (on the G4 PowerBook with 2 GB installed), which is high enough but no problem after several days with no restart of the browser. I will keep an eye on that on the G3s (the Pismo has 1 GB, and the iBook is also maxed out at measly 640 MB).

    ReplyDelete
  26. >JM+TI is definitely more memory efficient because an entire trace has to be created for each combination of types, whereas JM+TI can distill down to a much smaller set.

    With all due respect, I think the actual reason is eager jitcode discarding when TI is enabled (after 8 GCs IIRC).

    ReplyDelete
  27. I'm sure that helps, but in general, a given block of code may generate multiple traces but will only (after TI has whittled down choices) generate one single method compile.

    ReplyDelete
  28. Youtube's 240p files are in .flv format. Quicktime plugin is not assigned to play .flv files by default. To change this, open ~/Library/Application\Support/Firefox/Profiles/~/pluginreg.dat in a text editor, and find the Quicktime heading, followed by a numbered list.

    Add ":video/x-flv:Flash Video:flv:$" to the end of the list, preceding it with the appropriate number. Then, look to the line before the start of the list. It's the total number of entries in the list. Since we added one, we need to increase this number by one.

    I have found that this change does not "stick" unless you edit the file while TenFourFox is running. Then it takes effect when you restart TenFourFox.

    Results may vary. This only works if you have Perian installed, and maybe not even then, due to configuration errors. On my system, the Quicktime plugin seems to be unable to scrub the timeline from .flv files. What this means is that the Quicktime plugin will not appear until the entire .flv file has finished loading. That makes it just about worthless for youtube vids, certainly for the longer ones. There are alternatives to the Quicktime plugin such as the VLC plugin and the MPlayer plugin. I have been unable to test these.

    ReplyDelete
  29. The method QTE uses to push video into QuickTime Player had issues with Perian-powered video formats, so this might not work.

    QT indeed cannot scrub the timeline on .flv video until the entire video is loaded (this is true of both the plugin and the player).

    ReplyDelete

Due to an increased frequency of spam, comments are now subject to moderation.