Friday, June 20, 2014

Well, let's see how less bad 31 beta 2 is for you

OK, beta 2. This is the same Mozilla code drop as beta 1 with the following changes: IME keyboard composition of characters is fixed (accented characters, Japanese, Chinese, Korean, etc.), navigating to an MP3 audio file doesn't crash the browser now, off-main-thread compositing is turned off, generational garbage collection is turned off, and I've tuned the GC timeslice a little more based on some heavy testing this afternoon while hacking up phlegmy yuck from my inflamed respiratory passages.

Of the remaining problems, performance seems a bit better overall. The G5 doesn't care about OMTC, but does benefit from having GGC off. The Sawtooth, iBook and iMac G4 systems did much better with both of them off, so I turned everything off. The stall with Adblock is still there, but is literally about half as long (i.e., I measured it with a stopwatch), so this is an improvement, and it's still one-time-only on all of my test systems.

This is, at least right now in my virus-addled state, as much as I am currently able to do to crank up the browser until I can get better profiling support again. It gets it back to a state approximating Fx29 (minus the bugs). I am going to ship 31, and probably when it reaches release -- the work is done, and staying on 24 is not a viable option. Download it and get used to it. But while you use it, you have a small question and a big question for your homework assignment:

1. It looks like Safari bookmarks import is broken again. It is possible, nay, probable, that Mozilla doesn't test against Safari 4 anymore and possibly not even Safari 5.0. There is no Mozilla bug for this but that might simply be because Mozilla doesn't care about older versions of Safari. I don't know how bookmarks change between versions but it's probably time to consider just ripping this code out since we advise people to use HTML export from Safari anyway. Opinions?

2. Is 31 where we should drop source parity, i.e., fork and add things to 31 rather than trying to keep up? It's far easier to do this at the beginning of an ESR cycle because we can just start adding things we want (HTTP/2, SPDY updates, NSS updates, root certificate updates, ECMA6/HTML5/CSS3 features) as they start rolling out while still having security support from Mozilla, rather than try to catch up on a huge backlog when support runs out (Classilla). I had long thought what would doom us is some 10.6+ specific feature that's critical for the browser, but so far we've been able to hack around all that. Instead, what's more likely to doom us now is that our machines are just getting too slow to handle what Firefox expects to throw at them. The average age of a Power Mac is at least a decade; even the quad G5 is celebrating its eighth birthday. We are already having to cut marquee features to keep browser performance acceptable, and soon we will not be able to cut them: while we might get away with no generational GC for awhile (especially since Mozilla doesn't use it on FxOS devices yet), we already know OMTC will be mandatory soon, and those are just the land mines we know about right now. If we fork and go to feature parity, at least we can keep the browser core reasonably up to date but not have to contend with these issues.

The situation I worry about is that we will struggle up to 38 and have a browser that is crushingly slow on all but the highest-spec G5 systems with no solution for laptops or other G4 computers (let alone the G3), and it will be much harder to backport the things we want by that time. 31 is already a substantial compromise between performance and future compatibility. Where should that dividing line be? And don't forget that 10.6 support will be fading away too; when 10.6 goes, Mozilla can make even more assumptions about the hardware that won't be true for us. (Right now 10.6 is still holding on to about 20% of the Mac user base, which is an incredibly stable figure, but it's not growing and it's likely to slip as hardware ages, computers are upgraded and Apple stops supporting building against the 10.6 SDK.)

While you think about that, downloads, and updated release notes. There will be one more scheduled beta a couple weeks prior to final release just to put any lingering issues to bed, and in July we'll also do our annual update on the state of the Power Mac userbase. Now I have to go cough up some more nasty mucus, so excuse me.

Monday, June 16, 2014

Well, let's see how bad 31 is for you

I'm going to release what I have so far and see what you think. I'm typing this blog entry in 31.0b1, so I guess you can consider that an improvement over last time. Localizers, strings are frozen, so you can start your engines (see the announcement in issue 42).

I still don't have profiling working in any meaningful sense, even though I can compile and run a gprof build successfully; it looks like gprof has a bug on OS X preventing it from generating timing information, so it might be worth a dive into gcrt0.o some weekend when Scarlet Johansson is not visiting. It does generate function call counts, though, so it does run. I still need to do some more experimentation on how to get some sort of runtime instrumentation operating again.

While doing some comparisons to see what else had changed between 29 and 31, based on your glowing reviews of 29, it appears there is a "bug" in 29 that I "fixed" in 31. In Fx26, I tried using our custom CoreGraphics backend for doing web page rendering, found it too slow and occasionally crashy, and set it to use Cairo as before. In 29, Mozilla changed the backend selection code; now content couldn't be anything but CoreGraphics. In 31, I discovered this, and "fixed" it to use Cairo. Not only does setting it to CoreGraphics work, though, it's now become incredibly faster by comparison. In fact, if you're on a very fast G5 like a quad, try this trick: go to a standard-definition YouTube WebM video, start it playing (wait for the comments to load), make sure your processor is set to Highest performance, and click Full screen. This quad G5 effortlessly scales a 16:9 SD WebM video to fill an entire 1920x1080 display. Only hardware can do that. That's some hot stuff. I like.

So that was most of the problem; the rest was tuning the GC a bit more, and something that cropped up on the G4 iBook/1.33 which may have something to do with this WebM regression people are reporting on G4s. For 29, Mozilla introduced off-main-thread composition, which as we discussed uses a background thread to receive screen updates. These updates are coalesced; OMTC won't bother displaying half-frames, even though it will show full frames as fast as it receives them. On the quad (and probably any dual CPU Power Mac), this is no problem because the thread runs on another CPU and the G5 generates plenty of frames plenty fast. On G3s and G4s with no power management, this is also no problem, because they don't power throttle and run at their max clock speed, and frankly any hardware assistance at all will be much smoother than the old method. The 1GHz iMac G4 and the 450MHz Sawtooth, for example, really like 31.

On laptops or uniprocessor desktops set for power management, however, the processor may be throttled or forced to clock slew, and OMTC will be starved for frames. We shouldn't be angry with Mozilla for doing this; it's unfortunate but logical given that virtually every Intel consumer chip in the last few years has been a multicore device.

One solution is to crank CPU performance up to Highest or at least Automatic, but this isn't much of a solution for a laptop running on battery. The other solution is to turn OMTC off (set layers.offmainthreadcomposition.enabled to false and restart the browser). Now, I don't like this solution much because it's totally unsustainable: Mozilla has made no secret that main-thread compositing will be removed as soon as the last platform supports it (Linux), and cool things like asynchronous pan-and-zoom require OMTC. Disabling OMTC will be supported on 31 for as long as the ESR lasts, but I predict main-thread compositing will be removed somewhere around Fx33 or Fx34 and then you'll have to deal with the additional CPU requirements; OMTC is such a fundamental change that I cannot realistically maintain the old rendering system.

If this makes a substantial difference to WebM or other responsiveness on the systems in this group, since it doesn't really affect the outlier systems, I'm willing to ship 31 with OMTC disabled (I'll need lots of good evidence that it helps, though -- something reproducible, not anecdotal reports). However, you can be sure that it will disappear in the very near future and this will be the last stable branch we ship this way unless Mozilla finds a critical problem with it.

The only other remaining issue is shortly after startup the app beachballs for around 30 seconds. I can't make this happen on a clean profile, so I suspect an add-on is causing it, but I can't figure out which one between the affected systems; it's possible a number of them behave this way. The browser only does it the one time, so it's more of a nuisance, but it would be nice to fix. If it does it for you as well, I'd appreciate you trying to isolate what seems to set it off (make sure that it doesn't occur with a clean profile first, though).

Remember to make sure the MTE and QTE are up to date, and to undo any custom GC settings changes before you upgrade (download, release notes). I have not decided if we will release on 31.0 or 31.1, but we'll see what you think. I have the flu, so I need to go to bed. See you in the morning.

Friday, June 13, 2014

You may have noticed there are no 31 beta builds yet

That's because you don't want to use them. :(

During the aurora phase I run the build in a separate sandbox profile so it doesn't taint my main one, and mild-moderate performance issues are expected at that level because it's still relatively early in development. Plus, generally during Aurora I'm running a debugging build anyway, so performance isn't a primary concern. But by beta, it should be good. And I'm really upset: 31 is not. After just a few minutes of minor usage, scrolling on even very simple Netscape 3.0-compatible sites stutters on the quad G5 and the browser's responsiveness is unpredictable. OMTC doesn't make a difference. I can't even imagine how bad it would be on the iMac G4. About the only thing good I can say about it is that it uses almost 33% less memory than 24 does, but as far as speed is concerned it makes 24 look like a cheetah on Dexedrine.

My rule of thumb is whether I'd use the browser in the current state. And the answer is, I'm typing this blog entry in 24.6.0. :(

The next step, normally, would be to throw it into Shark and do some profiling to see where it's spending all its time. Except ... Shark doesn't understand the DWARF-2 debugging symbols we've used since Fx19 with gcc 4.6. In fact, not only does it not understand them, it causes pilotfish to lock up in an endless loop trying to process them. (Perhaps Shark with Xcode 3 does understand them correctly(?), but even if that were true it doesn't do me a lot of good on 10.4.) Shark does work with the stripped optimized builds, but then the profiling data just becomes a clot of PowerPC assembly language and hexadecimal addresses with no correlation to source code.

Instead of releasing builds, then, I'm trying to get TenFourFox to work with gprof, since gcc 4.6 will properly emit profiling information with -pg. However, this won't be anywhere near as good as Shark, since Shark would have been able to get profiling data right at the time it started going off the rails. Furthermore, it may well be that the problem is in something essential I can't revert. If I can't figure it out, the next step is to disable generational GC; we would then revert to the exact-rooting GC in 29, which would still get some memory savings, but that's a one-trick pony -- we would not make Fx38 without GGC. If that doesn't work either, then we will have to drop source parity, because I can't see myself using the browser in its current state and I'm certainly not going to subject anyone else to it.

This also means that IonMonkey is going to get backburnered again. I can't get it meaningfully working anyhow; it will run simple scripts, but anything that triggers a bailout eventually crashes, which is the same state it's been in for the last few months. I need a second set of eyes to look at the code because I'm spinning my wheels, and I'm trying to keep the browser afloat never mind enhancing it. Perhaps you're looking for a project?

I'm really tired.

Saturday, June 7, 2014

24.6.0 available

24.6.0 is available (release notes, files). Other than the usual ESR fixes, there is nothing special in this release, just a regular maintenance update. It will be slotted in Monday evening Pacific as usual.

I have not made much headway on getting past this critical problem with IonMonkey, but I'm still planning for the formal 31.0 beta to be released later next week regardless.