There is now apparently a potential workaround for those of you still having trouble getting the default search engine to stick. I still don't have a good theory for what's going on, however, so if you want to try the workaround please read my information request and post the requested information about your profile before and after to see if the suggested workaround affects that.
I will be in Australia for Christmas and New Years' visiting my wife's family, so additional development is likely to slow over the holidays. Higher priority items coming up will be implementing user agent support in the TenFourFox prefpane, adding some additional HTML5 features and possibly excising telemetry from garbage and cycle collection, but probably for 45.8 instead of 45.7. I'm also looking at adding some PowerPC-specialized code sections to the platform-independent Ion code generator to see if I can crank up JavaScript performance some more, and possibly some additional work to the AltiVec VP9 codec for VMX-accelerated intraframe prediction. I'm also considering adding AltiVec support to the Theora (VP3) decoder; even though its much lighter processing requirements yield adequate performance on most supported systems it could be a way to get higher resolution video workable on lower-spec G4s.
One of the problems with our use of a substantially later toolchain is that (in particular) debugging symbols from later compilers are often gibberish to older profiling and analysis tools. This is why, for example, we have a customized gdb, or debugging at even a basic level wouldn't be possible. If you're really a masochist, go ahead and compile TenFourFox with the debug profile and then try to use a tool like sample or vmmap, or even Shark, to analyze it. If you're lucky, the tool will just freeze. If you're unlucky, your entire computer will freeze or go haywire. I can do performance analysis on a stripped release build, but this yields sample backtraces which are too general to be of any use. We need some way of getting samples off a debug build but not converting the addresses in the backtrace to function names until we can transfer the samples to our own tools that do understand these later debugging symbols.
Apple's open source policy is problematic -- they'll open source the stuff they have to, and you can get at some components like the kernel this way, but many deep dark corners are not documented and one of those is how tools like /usr/bin/sample and Shark get backtraces from other processes. I suspect this is so that they can keep the interfaces unstable and avoid abetting the development of applications that depend on any one particular implementation. But no one said I couldn't disassemble the damn thing. So let's go.
(NB: the below analysis is based on Tiger 10.4.11. It is possible, and even likely, the interface changed in Leopard 10.5.)
With Depeche Mode blaring on the G5, because Dave Gahan is good for debugging, let's look at /usr/bin/sample since it's a much smaller nut to crack than Shark.
% otool -L /usr/bin/sample
/usr/bin/sample:
/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 567.29.0)
/System/Library/PrivateFrameworks/vmutils.framework/Versions/A/vmutils (compatibility version 1.0.0, current version 93.1.0)
/usr/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 88.3.4)
Interesting! A private framework! Let's see what Objective-C calls we might get (which are conveniently text strings).
% strings /usr/bin/sample |& more
__dyld_make_delayed_module_initializer_calls
__dyld_image_count
__dyld_get_image_name
__dyld_get_image_header
__dyld_NSLookupSymbolInImage
__dyld_NSAddressOfSymbol
libobjc
__objcInit
__dyld_mod_term_funcs
release
printStatistics
writeOutput:append:
stopSampling
sampleForDuration:interval:
preloadSymbols
initWithPid:symbolRichBinaries:
alloc
intValue
UTF8String
removeObjectAtIndex:
objectAtIndex:
count
indexOfObject:
arrayWithArray:
arguments
processInfo
forceStop
NSSampler
NSMutableArray
NSProcessInfo
NSAutoreleasePool
Interrupted
Not currently sampling -- exiting immediately.
-wait
-mayDie
-file
Waiting for '%s' to appear...
%s appeared.
%s cannot find a process you have access to which has a name like '%s'
sample
Sampling process %d each %u msecs %u times
syntax: sample <pid/partial name> <duration (secs)> { <msecs between samples> } <options>
options: {-mayDie} {-wait} {-subst <old> <new>}*
-file filename specifies where results should be written
-mayDie reads symbol information right away
-wait wait until the process named (usually by partial name) exists, then start sampling
-subst can be used to replace a stripped executable by another
Note that the program must have been started using a full path, rather than a relative path, for analysis to work, or that the -subst option must be specified
setObject:forKey:
dictionary
autorelease
mutableCopy
NSMutableDictionary
%s cannot examine process %d for unknown reasons, even though it appears to exist.
%s cannot examine process %d because the process does not exist.
%s cannot examine process %d (with name like %s) because it no longer appears to be running.
%s cannot examine process %d because you do not have appropriate privileges to examine it.
%s cannot examine process %d for unknown reasons.
-subst
Most of that looks fairly straightforward Objective-C stuff, but what's NSSampler? That's not documented anywhere. Magic Hat can't find it either with the default libraries, but it does if we add those private frameworks. If I use class-dump (3.1.2 works with 10.4), I can get a header file with its methods and object layout. (The header file it generates is usually better than Magic Hat's since Magic Hat sorts things in alphabetical rather than memory order, which will be problematic shortly.) Edited down, it looks like this. (I added the byte offsets, which are only valid for the 32-bit PowerPC OS X ABI.)
@interface NSSampler : NSObject
/*
{
00 BOOL _stop;
04 BOOL _stopped;
08 unsigned int _task;
12 int _pid;
16 double _duration;
24 double _interval;
32 NSMutableArray *_sampleData;
36 NSMutableArray *_sampleTimes;
40 double _previousTime;
48 unsigned int _numberOfDataPoints;
52 double _sigma;
60 double _max;
68 unsigned int _sampleNumberForMax;
72 ImageSymbols *_imageSymbols;
76 NSDictionary *_symbolRichBinaryMappings;
80 BOOL _writeBadAddresses;
84 TaskMemoryCache *_tmc;
88 BOOL _stacksFixed;
92 BOOL _sampleSelf;
96 struct backtraceMagicNumbers _magicNumbers;
}
*/
- (void) _cleanupStacks;
- (void) _initStatistics;
- (void) _makeHighPriority;
- (void) _makeTimeshare;
- (void) _runSampleThread: (id) parameter1;
- (void) dealloc;
- (void) finalize;
- (void) forceStop;
- (void) getStatistics: (void*) parameter1;
- (id) imageSymbols;
- (id) initWithPid: (int) parameter1;
- (id) initWithPid: (int) parameter1 symbolRichBinaries: (id) parameter2;
- (id) initWithSelf;
- (void) preloadSymbols;
- (void) printStatistics;
- (id) rawBacktraces;
- (void) sampleForDuration2: (double) parameter1 interval: (double) parameter2;
- (void) sampleForDuration: (unsigned int) parameter1 interval: (unsigned int) parameter2;
- (int) sampleTask;
- (void) setImageSymbols: (id) parameter1;
- (void) startSamplingWithInterval: (unsigned int) parameter1;
- (void) stopSampling;
- (id) stopSamplingAndReturnCallNode;
- (void) writeBozo;
- (void) writeOutput: (id) parameter1 append: (char) parameter2;
@end
Okay, so now we know what methods are there. How does one call this thing? Let's move to the disassembler. I'll save you my initial trudging through the machine code and get right to the good stuff. I've annotated critical parts below from stepping through the code in the debugger.
% otool -tV /usr/bin/sample /usr/bin/sample: (__TEXT,__text) section 00002aa4 or r26,r1,r1 << enter 00002aa8 addi r1,r1,0xfffc 00002aac rlwinm r1,r1,0,0,26 00002ab0 li r0,0x0 00002ab4 stw r0,0x0(r1) : : : 00003260 b 0x3310 00003264 bl 0x3840 ; symbol stub for: _getgid 00003268 bl 0x37d0 ; symbol stub for: _setgid
This looks like something that's trying to get at a process. Let's see what's here.
0000326c lis r3,0x0 00003270 or r4,r30,r30 00003274 addi r3,r3,0x3b9c 00003278 or r5,r29,r29 0000327c or r6,r26,r26 00003280 bl 0x37c0 ; symbol stub for: _printf$LDBL128 // "Sampling process ..." 00003284 lbz r0,0x39(r1) 00003288 cmpwi cr7,r0,0x1 0000328c bne+ cr7,0x32a0 // jumps to 32a0 : : : 000032a0 lis r4,0x0 000032a4 lwz r3,0x0(r31) 000032a8 or r5,r25,r25 000032ac lwz r4,0x5010(r4) // 0x399c "sampleForDuration:..." 000032b0 or r6,r23,r23 000032b4 bl 0x3800 ; symbol stub for: _objc_msgSend 000032b8 lis r4,0x0 000032bc lwz r3,0x0(r31) 000032c0 lwz r4,0x500c(r4) // 0x946ba288 "stopSampling" 000032c4 bl 0x3800 ; symbol stub for: _objc_msgSend 000032c8 lis r4,0x0 000032cc lwz r3,0x0(r31) 000032d0 lwz r4,0x5008(r4) // 0x3978 "writeOutput:..." 000032d4 or r5,r22,r22 000032d8 li r6,0x0 000032dc bl 0x3800 ; symbol stub for: _objc_msgSend
That seems simple enough. It seems to allocate and initialize an NSSampler object, (we assume) sets it up with [sampler initWithPid], calls [sampler sampleForDuration], calls [sampler stopSampling] and then calls [sampler writeOutput] to write out the result.
This is not what we want to do, however. What I didn't see in either the disassembly or the class description was an explicit step to convert addresses to symbols, which is what we want to avoid. We might well suspect -(void) writeOutput is doing that, and if we put together a simple-minded program to make these calls as sample does, we indeed get a freeze when we try to write the output. We want to get to the raw addresses instead, but Apple doesn't provide any getter for those tantalizing NSMutableArrays containing the sample data.
Unfortunately for Apple, class-dump gave us the structure of the NSSampler object (recall that Objective-C objects are really just structs with delusions of grandeur), and conveniently those object pointers are right there, so we can pull them out directly! Since they're just NSArrays, hopefully they're smart enough to display themselves. Let's see. (In the below, replace XXX with the process you wish to spy on.)
/* gcc -g -o samplemini samplemini.m \ -F/System/Library/PrivateFrameworks \ -framework Cocoa -framework CHUD \ -framework vmutils -lobjc */ #include <Cocoa/Cocoa.h> #include "NSSampler.h" int main(int argc, char **argv) { NSSampler *sampler; NSMutableArray *sampleData; NSMutableArray *sampleTimes; uint32_t count, sampleAddr; NSAutoreleasePool *shutup = [[NSAutoreleasePool alloc] init]; sampler = [[NSSampler alloc] init]; [sampler initWithPid:XXX]; // you provide [sampler sampleForDuration:10 interval:10]; // 10 seconds, 10 msec [sampler stopSampling]; // break into the NSSampler struct sampleAddr = (uint32_t)sampler; count = *(uint32_t *)(sampleAddr + 48); fprintf(stdout, "count = %i\n", count); sampleData = (NSMutableArray *)*(uint32_t *)(sampleAddr + 32); sampleTimes = (NSMutableArray *)*(uint32_t *)(sampleAddr + 36); fprintf(stdout, "%s", [[sampleData description] cString]); fprintf(stdout, "%s", [[sampleTimes description] cString]); [sampler dealloc]; return 0; }Drumroll please.
count = 519
(
<NSStackBacktrace: Thread 1503: 0x9000af48 0xefffdfd0 0x907de9ac 0x907de2b0
0x932bcb20 0x932bc1b4 0x932bc020 0x937a1734 0x937a13f8 0x06d53d3c 0x9379d93c 0x0
6d57bc8 0x07800f48 0x0785f004 0x0785f9cc 0x0785fd20 0x00004ed4 0x00001d5c 0x0000
1a60 0x9000ae9c 0xffffffe1 > ,
<NSStackBacktrace: Thread 1603: 0x9002ec8c 0x00424b10 0x05069cb4 0x0504638c
0x050490e0 0x05056600 0x050532cc 0x9002b908 0x0506717c 0x0000016b > ,
<NSStackBacktrace: Thread 1703: 0x9002bfc8 0x90030a7c 0x015a0b84 0x04d4d40c
0x015a1f18 0x9002b908 0x90030aac 0xffffffdb > ,
:
:
:
)(
0.01796096563339233,
0.01785099506378174,
0.01814299821853638,
0.01780200004577637,
:
:
:
)
We now have the raw backtraces and the timings, in fractions of a second. There is obviously much more we can do with this, and subsequent to my first experiment I improved the process further, but this suffices for explaning the basic notion. In a future post we'll look at how we can turn those addresses into actual useful function names, mostly because I have a very hacky setup to do so right now and I want to refine it a bit more. :) The basic notion is to get the map of where dyld loaded each library in memory and then compute which function is running based on that offset from the sampled address. /usr/bin/vmmap would normally be the tool we'd employ to do this, but it barfs on TenFourFox too. Fortunately our custom gdb7 can get such a map, at least on a running process. More on that later.
One limitation is that NSSampler doesn't seem able to get samples more frequently than every 15ms or so from a running TenFourFox process even if you ask. I'm not sure yet why this is because other processes have substantially less overhead, though it could be thread-related. Also, even though NSSampler accepts an interval argument, it will grab samples as fast as it can no matter what that interval is. When run against Magic Hat as a test it grabbed them as fast as 0.1ms, so stand by for lots of data!
Incidentally, this process is not apparently what Shark does; Shark uses the later PerfTool framework and an object called PTSampler to do its work instead of vmutils. Although it has analogous methods, the structure of PTSampler is rather more complex than NSSampler and I haven't fully explored its depths. Nevertheless, when it works, Shark can get much more granular samples of processor activity than NSSampler, so it might be worth looking into for a future iteration of this tool. For now, I can finally get backtraces I can work with, and as a result, hopefully some very tricky problems in TenFourFox might get actually solved in the near future.
Runs Well.
ReplyDeleteBut still Crashes on Exit. Seems that TFF since about 38 has a bad habbit of doing that (nearly 2/3 of the time).
A crash report always reads something about "XRE_TelemetryAccumulate" like this:
Thread 43 Crashed:
0 XUL 0x078ae4f4 XRE_TelemetryAccumulate + 250356
1 libnss3.dylib 0x015a5e28 PRP_NakedBroadcast + 1336
2 libSystem.B.dylib 0x9002b908 _pthread_body + 96
Any ideas?
I haven't seen any crashes on quit on any of the test systems, and we haven't received widespread reports of the same.
DeleteAn offset of 250356 from that symbol likely indicates the crash is not actually there and that was the closest symbol the OS could find to mark in the crash report.
I can only suggest the usual things to investigate or test, such as disabling add-ons, considering a clean test profile to see if it happens there too, or looking at system extensions and kexts that can interfere (Soundflower and Audio Hijack have been implicated in previous unexplained repeatable crashes).
On further research, it appears this is a fairly common problem on all 32-versions of Firefox, as the browser panics if it runs out of virtual memory while shutting-down (why does it leak so much just to quit??).
ReplyDeleteThe big question is how much damage can these daily-crashes do to the tissue-paper fragile 'places.sqlite' file that the browser overly leans on. Would sure be nice if there was more controlled way of shutting down the app.
I'm not sure if you saw my reply above, but either way, let me be a bit more explicit. Theoretically speaking, yes, 32-bit versions of Firefox are more likely to run out of memory. TenFourFox will probably hit the limit at just under 2GB due to its stack requirements.
DeleteHowever, I have 46 tabs open and about half of them loaded on this G5, and a similar number on my iMac G4. Quitting and restarting the browser doesn't cause any crashes on the test systems, and as I say we haven't received widespread reports of the same, so I consider your situation to be abnormal. It's possible free memory exhaustion is the problem but it's not the first thing I'd think of.
Again, I'd look at the usual suspects first to begin with. Feel free to open a Tenderapp ticket if you want to go over this in more detail.