During the aurora phase I run the build in a separate sandbox profile so it doesn't taint my main one, and mild-moderate performance issues are expected at that level because it's still relatively early in development. Plus, generally during Aurora I'm running a debugging build anyway, so performance isn't a primary concern. But by beta, it should be good. And I'm really upset: 31 is not. After just a few minutes of minor usage, scrolling on even very simple Netscape 3.0-compatible sites stutters on the quad G5 and the browser's responsiveness is unpredictable. OMTC doesn't make a difference. I can't even imagine how bad it would be on the iMac G4. About the only thing good I can say about it is that it uses almost 33% less memory than 24 does, but as far as speed is concerned it makes 24 look like a cheetah on Dexedrine.
My rule of thumb is whether I'd use the browser in the current state. And the answer is, I'm typing this blog entry in 24.6.0. :(
The next step, normally, would be to throw it into Shark and do some profiling to see where it's spending all its time. Except ... Shark doesn't understand the DWARF-2 debugging symbols we've used since Fx19 with gcc 4.6. In fact, not only does it not understand them, it causes pilotfish to lock up in an endless loop trying to process them. (Perhaps Shark with Xcode 3 does understand them correctly(?), but even if that were true it doesn't do me a lot of good on 10.4.) Shark does work with the stripped optimized builds, but then the profiling data just becomes a clot of PowerPC assembly language and hexadecimal addresses with no correlation to source code.
Instead of releasing builds, then, I'm trying to get TenFourFox to work with gprof, since gcc 4.6 will properly emit profiling information with -pg. However, this won't be anywhere near as good as Shark, since Shark would have been able to get profiling data right at the time it started going off the rails. Furthermore, it may well be that the problem is in something essential I can't revert. If I can't figure it out, the next step is to disable generational GC; we would then revert to the exact-rooting GC in 29, which would still get some memory savings, but that's a one-trick pony -- we would not make Fx38 without GGC. If that doesn't work either, then we will have to drop source parity, because I can't see myself using the browser in its current state and I'm certainly not going to subject anyone else to it.
This also means that IonMonkey is going to get backburnered again. I can't get it meaningfully working anyhow; it will run simple scripts, but anything that triggers a bailout eventually crashes, which is the same state it's been in for the last few months. I need a second set of eyes to look at the code because I'm spinning my wheels, and I'm trying to keep the browser afloat never mind enhancing it. Perhaps you're looking for a project?
I'm really tired.
 
 
You can always build with debug information, copy the resulting binary, strip it with -S (this should only strip the debug information, and leave in the symbol information), profile it, and afterwards manually (or scripted) perform address -> line mappings using the copy with debug information (using gdb or so). By leaving in the symbol information, Shark should still be able to group the addresses per function.
ReplyDeleteThat's a thought. I don't really need the debug information for this purpose, though it would be nice.
DeleteActually, I take that back. The optimized nostrip build isn't built with -g. So it's something about the symbol table also.
DeleteEven without -g, the compiler may be inserting DWARF ehframe information. Trying a strip -S shouldn't hurt at least.
DeleteMay a few hundred milligrams of CoQ10, a grep for the fourth series of vector integer optimizations for IA64, and some kale juice lighten your burden. PopSugar had a recipe for the last one; not sure I'd want to go from store to store testing smoked red peppers to get the salt rim for that cocktail going, though.
ReplyDelete