There were two problems with JavaScript, and I appreciate Chris Trusch for giving me reliable figures that I could replicate (performance issues cannot be fixed without them). For this purpose I compared performance in debug builds, which is not something I usually do for obvious reasons and is what led to me finding the second problem. Indeed SunSpider and microbenchmarks were performing worse in 31 and it turned out to be another side effect of the added Ion optimization analysis I had to work around once already. I added another check to not run it except in the situations that 29 runs it, and while the analysis is more heavyweight than 29's, the added improvements to inline caches in 31 generally even the impact. I think I can port this change forward; it's not very complex. Interestingly, V8 was at most fractionally perturbed by this, at least on the quad, so we're going to have to watch both benchmarks again in the future.
However, when I built the optimized version, it was still the same speed as beta 2 despite the debug build now being substantially faster. That didn't make any sense at all. Why would a debug build now beat the optimized build by almost a factor of two? It's not only bigger, it's deliberately more inefficient!
After about a week comparing source code, it became obvious that indeed there was nothing in Mozilla's debug code that was doing an operation with side effects or something getting cached that wasn't in the opt build. To prove that, I turned Mozilla's -DDEBUG on for the G5 opt build and the numbers indeed sank further. Incredibly, Shark would run the JS shell at least (even if it won't run the browser, so now I'm reexamining my theory about pilotfish -- perhaps it's just objecting to the size of XUL, which would not be surprising), and comparing the performance traces in Shark showed that the G5 opt build was slow across the board even in things that had no debug code at all, like C++ object constructors. WTF?
The other difference in a debug build is the compiler settings. So, over the second week, I did like we used to do to find the bad extension and ran and re-ran builds with different permutations of compiler and build settings to find the setting in the debugging configuration file that made the difference. And the winner was, incredibly, --enable-debug-symbols=-gdwarf-2 -- adding that to all the optimization configs greatly improved performance. For JavaScript, and now the entire browser, the opt build is once again substantially faster than the debug build just as you would expect. (One other note is that the G5-specific branch stanzas we relied on for 11-29 look like they are not working well in PPCBC, so the G5 build now uses the same shortened branch stanzas Ben innovated for JaegerMonkey on G3 and G4. This further improves V8 by about 8%.)
I don't have a good explanation for why DWARF-2 symbols weren't needed in 29 but are needed now; all I can think of is that somehow we crossed some internal maximum and we need the more efficient modern representation, even though the opt build does strip out most of the symbol table. This may have a consequence for debugging and crash reports, but I don't think we can go back to the old STABS symbols for 31. We'll just have to deal with the problems as they come up.
Your remaining homework:
- I'm toying with bumping the GC timeslice back to 100ms as it was in 29 (it remains at 30ms right now, same as 24). You can try this by setting javascript.options.mem.gc_incremental_slice_ms to 100 and restarting the browser. This should not make much immediate difference; this is for later on when there are more compartments for the garbage collector to search. I just want to make sure that it doesn't slow the browser initially.
- Now that we have substantially changed the browser foundation, we should reexamine some earlier performance conclusions. If you are on a multi-core Power Mac, such as a dual G4/G5 or the quad, try turning OMTC back on and restarting the browser. Does it help? Does it help on a uniprocessor Mac? It's hard for me to judge on my test systems. To try this, toggle layers.offmainthreadcomposition.enabled to true and restart the browser. The G5 seems a bit better, but it's a wash on the iMac, and the iBook is still questionably worse.
- The most important question: release 31 on time on July 22, or wait for 31.1? It's important to us to prove to the Mozilla community we're still a viable and well-maintained port, and I'd like to release on time to make that clear. However, if you have a good reason to wait, I'm willing to consider it; either way, though, we will release no later than 31.1. We need to stay current and ESR24 is going away.
One outstanding bug is a printing crash on Tiger Server only. I have never supported the Server versions of 10.4/10.5, mostly because I have no way to test them and I don't have a need to run OS X Server personally. However, if you're using it, please look at issue 279. While I will not hold the release to fix this and I don't have the ability right now to fix it myself, if you do, I will accept your patch as long as it doesn't regress the regular OS X client versions.
In July we will have our annual state of the Power Mac userbase, and I also have some fun future blog posts in the hopper on my expensive but incredibly fun Amiga 4000T (with Picasso IV, 68060 CPU card and Hydra NIC), and my old Wallstreet G3 now dualbooting Rhapsody 5.6 (Mac OS X Server 1.2) and OS 9.2.2. Plus, I picked up a Cube G4 when I was in Berkeley last week that's waiting for a power supply. The shell is in excellent shape with no major nicks or scuffs and I think it'll be a great test system for playing around with MorphOS in my copious spare time. If you're in the Bay Area and looking to pick up some more Power Macs, they've got plenty at M.A.C. on University and Shattuck and they're willing to deal. Check it out.
Excellent.
ReplyDeleteYes, I saw it. I salute the guy's ingenuity, but the amount of water involved would make a leak catastrophic not only for the machine itself but also anything near it, and that radiator is freaking huge. Plus, distilled water would need to get changed on a fairly regular basis, I should think. It's just a bit too much for me, but it *is* cool looking.
The JS performance regressions are definitely fixed. Page load times and dynamic content loading are now comparable to 29 or even better. Sunspider supports this. Almost all its benchmarks are better than 29 and 24.
ReplyDeleteSunspider G4 1.33 GHz
29 3951.8
31b2 6800.5
31b3 3801.3
Cameron, sorry I caused you so much work with this. I felt that I needed to speak up because the performance had decreased so dramatically on low end machines that it was really bordering unusable. Now it's ready for prime time, and it's now a usable browser also on the G3 400 MHz Pismo.
And a pleasant surprise: I have no idea what happened to web.m playback but it's greatly improved in b3. YouTube videos suddenly play pretty much as smoothly as they did in 24. I checked back and forth between b2 and b3 to see whether it's he recent YouTube change that did it, but it's not. The change is in 31b3.
No, I appreciate you trying to get solid, reproducible numbers. I know you know better than most folks on this blog how useless vague reports are, and on the quad it's hard for me to judge sometimes because it can compensate by just brute forcing its way through.
DeleteThe 29 WebM difference is interesting, because this means it may have been crossing that "internal maximum" already and it's just involving more of the browser now.
I did discover a showstopper early this morning (issue 280) but hopefully I can get it fixed in time. It seems to be a compiler problem also, but only in the G5 build.
Things are looking up again on the G3 iMac:
ReplyDeleteTFF 24:
Time to window opening: 41 seconds, 13 seconds when cached
Time to TFF default window fully drawn: 47 secs, 16 secs cached
AdBlock makes little difference
TFF 31b3:
Time to window opening: 29 seconds, 13 seconds when cached
Time to TFF default window fully drawn: 38 secs, 20 secs cached
TFF 31b3 + AdBlock:
Time to window opening: 13 secs when cached
Time to TFF default window fully drawn: 20 secs cached
Then after about 30 seconds I get a consistent 20 second beachball.
I see the beachball with Adblock as well after half a minute on the 400 MHz G3 Pismo, but it's only for 5-10 seconds. (This may depend on how many filter subscriptions and custom filters you have. I only use EasyList plus a rather short social block list and a handfull of custom filters.) After that, everything is okay in b3, even Facebook is easily usable, vs about 50% beachball time during the whole browsing experience with b2.
DeleteI have EasyList and I believe those options you're given after installing Adblock to remove social media buttons. If I de-activate the EasyList subscription the beachball time drops to 13 seconds.
DeleteI haven't tried my custom filters, as set up on my normal login user running TFF 24 at the moment. Those custom filters are the reason I haven't switched to something lighter like Bluhell - its fixed filter list is missing too many ads and I need the custom filters to block stupid huge background images that some people seem to think are great to put on their blogs!
Am I the only one using AdBlockEdge (or any kind of AdBlock) that is NOT experiencing beach balls? Seriously, I don't have this problem at all.
ReplyDeleteI had to revert back to 24 until b3 came out. This version is much faster and has now enabled me to stay on 31. The only issues I'm having are addon related and I'm waiting for updates.
It depends on your Mac. What's a short 100% processor spike right after startup on my G4 1.33 GHz PowerBook (with no beachball) translates to a 20 sec beachball on the Pismo. Adblock is definitely doing *something* on startup. I can only speculate, but if e.g. the SunSpider String benchmark (which improved dramatically in b3 compared to b2) does what its name suggests then the Adblock slowness in b2 sort of makes sense.
DeleteNo beachballs here using AdBlockEdge. Everything is as stable as a rock on my Power Mac G4 Mystic and G5 iMac ALS.
DeleteI can't tell a difference on the G4 PowerBook (uniprocessor, obviously) between OMTC enabled or disabled. I did direct comparisons, then I used the browser for one day with, the next day without, and there's just no perceivable difference. Web.m may be a little better with OMTC disabled, but not consistently or in any way verifiable with numbers. I don't even know if there's a technical connection. To put it positively, my PowerBook G4 is quite happy with both settings. The same goes for javascript.options.mem.gc_incremental_slice_ms = 100.
ReplyDeleteI don't use the G3s with enough heavy lifting over prolonged time periods (let alone web.m) to make a statement. Direct comparison, again, showed no perceivable difference.
This is a great recommendation. I've been trying Bluhell instead of AdBlock Plus. I was confused at first because it says it's a firewall, but technically it's not, it's just an ad blocker. It is very lightweight and seems to be well-maintained. The ad block rules are based on EasyList and therefore should work well for anglophone web surfing, but it does miss some ads that are covered by localized filter lists. Personally I need some power user features in AdBlock Plus (Edge) like custom filters, Google infiltration control and blocking of the omnipresent social media buttons. But there's a good chance Bluhell will replace AdBlock on my G3s, which use a reduced set of extensions anyway.
ReplyDeleteLocalization progress for 31:
ReplyDeleteGerman: done
Finnish: (me, this week)
French: grafiko…?
Italian: done
Polish: aquariu…?
Russian: done
Spanish: done
For 17 we used to have Swedish and Asturian also, but we still need some custom string to be translated for those (Knezze…? mikelg…?).
Please provide the translation into Portuguese of Brazil too! Tell me which custom terms, in English, must be translated that I send you the translations of them!
ReplyDeleteRegards,
Igor Isaias Banlian
Hello, please see https://code.google.com/p/tenfourfox/issues/detail?id=42#c155 for strings that need to be translated. You can work inside the rtf file and then re-upload it to Google code. Thanks! Please note, though, that I probably won't have time to make the installer during the next two weeks, so this won't be ready in time for release.
DeleteOne lightweight alternative is to use a HOSTS file.
ReplyDeleteI would recommend droping source parity and focusing on performance improvements.
G5 Dual 2,3
ReplyDelete24.6.0 fine!
31.0 1/2 minute beachball = current unusable
Adblock Edge 2.1.3
Awesome screenshots
DNT
DownloadHelper 4.9.22
Ghostery 5.3.2
Greasemonkey 1.15
Https-Everywhere 3.5.3
QTEnabler 115
Suspend Tab
UTubeUnblocker 0.5.4