Friday, July 4, 2014

Well, let's just wave our hands around a bit with 31.0b3 for you

Today in the USA, we celebrate July 4th where America freed itself from the little-endian tyranny of King Tim by throwing Mac Pros into the harbour. Or something like that, I was always daydreaming about Alyssa Milano in civics class. Anyway, Beta 3 is now available (downloads, release notes) corresponding to the Firefox 31 beta 6 code drop. I am not able to distinguish much performance difference from 29 now, and the initial beachball with Adblock is gone on the quad G5 and lasts only a few seconds on the iBook and iMac G4 systems. On battery in Reduced mode, I consider my iBook G4/1.33's performance acceptable -- I can get into Gmail and my office mail much more quickly, and scrolling is significantly less sluggish. This also fixes the problem with spurious extra Window menu entries. Make sure you update your Adblock Edge and Plus; there were some recent updates for ABE, at least.

There were two problems with JavaScript, and I appreciate Chris Trusch for giving me reliable figures that I could replicate (performance issues cannot be fixed without them). For this purpose I compared performance in debug builds, which is not something I usually do for obvious reasons and is what led to me finding the second problem. Indeed SunSpider and microbenchmarks were performing worse in 31 and it turned out to be another side effect of the added Ion optimization analysis I had to work around once already. I added another check to not run it except in the situations that 29 runs it, and while the analysis is more heavyweight than 29's, the added improvements to inline caches in 31 generally even the impact. I think I can port this change forward; it's not very complex. Interestingly, V8 was at most fractionally perturbed by this, at least on the quad, so we're going to have to watch both benchmarks again in the future.

However, when I built the optimized version, it was still the same speed as beta 2 despite the debug build now being substantially faster. That didn't make any sense at all. Why would a debug build now beat the optimized build by almost a factor of two? It's not only bigger, it's deliberately more inefficient!

After about a week comparing source code, it became obvious that indeed there was nothing in Mozilla's debug code that was doing an operation with side effects or something getting cached that wasn't in the opt build. To prove that, I turned Mozilla's -DDEBUG on for the G5 opt build and the numbers indeed sank further. Incredibly, Shark would run the JS shell at least (even if it won't run the browser, so now I'm reexamining my theory about pilotfish -- perhaps it's just objecting to the size of XUL, which would not be surprising), and comparing the performance traces in Shark showed that the G5 opt build was slow across the board even in things that had no debug code at all, like C++ object constructors. WTF?

The other difference in a debug build is the compiler settings. So, over the second week, I did like we used to do to find the bad extension and ran and re-ran builds with different permutations of compiler and build settings to find the setting in the debugging configuration file that made the difference. And the winner was, incredibly, --enable-debug-symbols=-gdwarf-2 -- adding that to all the optimization configs greatly improved performance. For JavaScript, and now the entire browser, the opt build is once again substantially faster than the debug build just as you would expect. (One other note is that the G5-specific branch stanzas we relied on for 11-29 look like they are not working well in PPCBC, so the G5 build now uses the same shortened branch stanzas Ben innovated for JaegerMonkey on G3 and G4. This further improves V8 by about 8%.)

I don't have a good explanation for why DWARF-2 symbols weren't needed in 29 but are needed now; all I can think of is that somehow we crossed some internal maximum and we need the more efficient modern representation, even though the opt build does strip out most of the symbol table. This may have a consequence for debugging and crash reports, but I don't think we can go back to the old STABS symbols for 31. We'll just have to deal with the problems as they come up.

Your remaining homework:

  • I'm toying with bumping the GC timeslice back to 100ms as it was in 29 (it remains at 30ms right now, same as 24). You can try this by setting javascript.options.mem.gc_incremental_slice_ms to 100 and restarting the browser. This should not make much immediate difference; this is for later on when there are more compartments for the garbage collector to search. I just want to make sure that it doesn't slow the browser initially.

  • Now that we have substantially changed the browser foundation, we should reexamine some earlier performance conclusions. If you are on a multi-core Power Mac, such as a dual G4/G5 or the quad, try turning OMTC back on and restarting the browser. Does it help? Does it help on a uniprocessor Mac? It's hard for me to judge on my test systems. To try this, toggle layers.offmainthreadcomposition.enabled to true and restart the browser. The G5 seems a bit better, but it's a wash on the iMac, and the iBook is still questionably worse.

  • The most important question: release 31 on time on July 22, or wait for 31.1? It's important to us to prove to the Mozilla community we're still a viable and well-maintained port, and I'd like to release on time to make that clear. However, if you have a good reason to wait, I'm willing to consider it; either way, though, we will release no later than 31.1. We need to stay current and ESR24 is going away.

I have not come to a firm decision about dropping source parity, though some code changes in 33 are making me unhappy (they're not portkillers, but they're a lot of work and will invalidate substantial portions of our changesets). I might jump straight to 33 from 31 to deal with those and the irregexp migration at the same time. If that goes awry, that would make my decision for me.

One outstanding bug is a printing crash on Tiger Server only. I have never supported the Server versions of 10.4/10.5, mostly because I have no way to test them and I don't have a need to run OS X Server personally. However, if you're using it, please look at issue 279. While I will not hold the release to fix this and I don't have the ability right now to fix it myself, if you do, I will accept your patch as long as it doesn't regress the regular OS X client versions.

In July we will have our annual state of the Power Mac userbase, and I also have some fun future blog posts in the hopper on my expensive but incredibly fun Amiga 4000T (with Picasso IV, 68060 CPU card and Hydra NIC), and my old Wallstreet G3 now dualbooting Rhapsody 5.6 (Mac OS X Server 1.2) and OS 9.2.2. Plus, I picked up a Cube G4 when I was in Berkeley last week that's waiting for a power supply. The shell is in excellent shape with no major nicks or scuffs and I think it'll be a great test system for playing around with MorphOS in my copious spare time. If you're in the Bay Area and looking to pick up some more Power Macs, they've got plenty at M.A.C. on University and Shattuck and they're willing to deal. Check it out.

16 comments:

  1. Excellent.

    Yes, I saw it. I salute the guy's ingenuity, but the amount of water involved would make a leak catastrophic not only for the machine itself but also anything near it, and that radiator is freaking huge. Plus, distilled water would need to get changed on a fairly regular basis, I should think. It's just a bit too much for me, but it *is* cool looking.

    ReplyDelete
  2. The JS performance regressions are definitely fixed. Page load times and dynamic content loading are now comparable to 29 or even better. Sunspider supports this. Almost all its benchmarks are better than 29 and 24.

    Sunspider G4 1.33 GHz
    29 3951.8
    31b2 6800.5
    31b3 3801.3

    Cameron, sorry I caused you so much work with this. I felt that I needed to speak up because the performance had decreased so dramatically on low end machines that it was really bordering unusable. Now it's ready for prime time, and it's now a usable browser also on the G3 400 MHz Pismo.

    And a pleasant surprise: I have no idea what happened to web.m playback but it's greatly improved in b3. YouTube videos suddenly play pretty much as smoothly as they did in 24. I checked back and forth between b2 and b3 to see whether it's he recent YouTube change that did it, but it's not. The change is in 31b3.

    ReplyDelete
    Replies
    1. No, I appreciate you trying to get solid, reproducible numbers. I know you know better than most folks on this blog how useless vague reports are, and on the quad it's hard for me to judge sometimes because it can compensate by just brute forcing its way through.

      The 29 WebM difference is interesting, because this means it may have been crossing that "internal maximum" already and it's just involving more of the browser now.

      I did discover a showstopper early this morning (issue 280) but hopefully I can get it fixed in time. It seems to be a compiler problem also, but only in the G5 build.

      Delete
  3. Things are looking up again on the G3 iMac:

    TFF 24:
    Time to window opening: 41 seconds, 13 seconds when cached
    Time to TFF default window fully drawn: 47 secs, 16 secs cached
    AdBlock makes little difference

    TFF 31b3:
    Time to window opening: 29 seconds, 13 seconds when cached
    Time to TFF default window fully drawn: 38 secs, 20 secs cached

    TFF 31b3 + AdBlock:
    Time to window opening: 13 secs when cached
    Time to TFF default window fully drawn: 20 secs cached
    Then after about 30 seconds I get a consistent 20 second beachball.


    ReplyDelete
    Replies
    1. I see the beachball with Adblock as well after half a minute on the 400 MHz G3 Pismo, but it's only for 5-10 seconds. (This may depend on how many filter subscriptions and custom filters you have. I only use EasyList plus a rather short social block list and a handfull of custom filters.) After that, everything is okay in b3, even Facebook is easily usable, vs about 50% beachball time during the whole browsing experience with b2.

      Delete
    2. I have EasyList and I believe those options you're given after installing Adblock to remove social media buttons. If I de-activate the EasyList subscription the beachball time drops to 13 seconds.

      I haven't tried my custom filters, as set up on my normal login user running TFF 24 at the moment. Those custom filters are the reason I haven't switched to something lighter like Bluhell - its fixed filter list is missing too many ads and I need the custom filters to block stupid huge background images that some people seem to think are great to put on their blogs!

      Delete
  4. Am I the only one using AdBlockEdge (or any kind of AdBlock) that is NOT experiencing beach balls? Seriously, I don't have this problem at all.

    I had to revert back to 24 until b3 came out. This version is much faster and has now enabled me to stay on 31. The only issues I'm having are addon related and I'm waiting for updates.

    ReplyDelete
    Replies
    1. It depends on your Mac. What's a short 100% processor spike right after startup on my G4 1.33 GHz PowerBook (with no beachball) translates to a 20 sec beachball on the Pismo. Adblock is definitely doing *something* on startup. I can only speculate, but if e.g. the SunSpider String benchmark (which improved dramatically in b3 compared to b2) does what its name suggests then the Adblock slowness in b2 sort of makes sense.

      Delete
    2. No beachballs here using AdBlockEdge. Everything is as stable as a rock on my Power Mac G4 Mystic and G5 iMac ALS.

      Delete
  5. I can't tell a difference on the G4 PowerBook (uniprocessor, obviously) between OMTC enabled or disabled. I did direct comparisons, then I used the browser for one day with, the next day without, and there's just no perceivable difference. Web.m may be a little better with OMTC disabled, but not consistently or in any way verifiable with numbers. I don't even know if there's a technical connection. To put it positively, my PowerBook G4 is quite happy with both settings. The same goes for javascript.options.mem.gc_incremental_slice_ms = 100.

    I don't use the G3s with enough heavy lifting over prolonged time periods (let alone web.m) to make a statement. Direct comparison, again, showed no perceivable difference.

    ReplyDelete
  6. This is a great recommendation. I've been trying Bluhell instead of AdBlock Plus. I was confused at first because it says it's a firewall, but technically it's not, it's just an ad blocker. It is very lightweight and seems to be well-maintained. The ad block rules are based on EasyList and therefore should work well for anglophone web surfing, but it does miss some ads that are covered by localized filter lists. Personally I need some power user features in AdBlock Plus (Edge) like custom filters, Google infiltration control and blocking of the omnipresent social media buttons. But there's a good chance Bluhell will replace AdBlock on my G3s, which use a reduced set of extensions anyway.

    ReplyDelete
  7. Localization progress for 31:

    German: done
    Finnish: (me, this week)
    French: grafiko…?
    Italian: done
    Polish: aquariu…?
    Russian: done
    Spanish: done
    For 17 we used to have Swedish and Asturian also, but we still need some custom string to be translated for those (Knezze…? mikelg…?).

    ReplyDelete
  8. Please provide the translation into Portuguese of Brazil too! Tell me which custom terms, in English, must be translated that I send you the translations of them!

    Regards,
    Igor Isaias Banlian

    ReplyDelete
    Replies
    1. Hello, please see https://code.google.com/p/tenfourfox/issues/detail?id=42#c155 for strings that need to be translated. You can work inside the rtf file and then re-upload it to Google code. Thanks! Please note, though, that I probably won't have time to make the installer during the next two weeks, so this won't be ready in time for release.

      Delete
  9. One lightweight alternative is to use a HOSTS file.

    I would recommend droping source parity and focusing on performance improvements.

    ReplyDelete
  10. G5 Dual 2,3

    24.6.0 fine!
    31.0 1/2 minute beachball = current unusable

    Adblock Edge 2.1.3
    Awesome screenshots
    DNT
    DownloadHelper 4.9.22
    Ghostery 5.3.2
    Greasemonkey 1.15
    Https-Everywhere 3.5.3
    QTEnabler 115
    Suspend Tab
    UTubeUnblocker 0.5.4

    ReplyDelete

Due to an increased frequency of spam, comments are now subject to moderation.