Wednesday, February 25, 2015

IonPower passes V8!

At least in Baseline-only mode, but check it out!

Starting program: /Volumes/BruceDeuce/src/mozilla-36t/obj-ff-dbg/dist/bin/js --no-ion --baseline-eager -f run.js
warning: Could not find malloc init callback function.
Make sure malloc is initialized before calling functions.
Reading symbols for shared libraries ....................................................................+++......... done
Richards: 144
DeltaBlue: 137
Crypto: 215
RayTrace: 230
EarleyBoyer: 193
RegExp: 157
Splay: 140
NavierStokes: 268
----
Score (version 7): 180

Program exited normally.

Please keep in mind this is a debugging version and performance is impaired relative to PPCBC (and if I had to ship a Baseline-only compiler in TenFourFox 38, it would still be PPCBC because it has the best track record). However, all of the code cleanup for IonPower and its enhanced debugging capabilities paid off: with one exception, all of the bugs I had to fix to get it passing V8 were immediately flagged by sanity checks during code generation, saving much labourious single stepping through generated assembly to find problems.

I have a Master's program final I have to study for, so I'll be putting this aside for a few days, but after I thoroughly bomb it the next step is to mount phase 4, where IonPower can pass the test suite in Baseline mode. Then the real fun will begin -- true Ion-level compilation on big-endian PowerPC. We are definitely on target for 38, assuming all goes well.

I forgot to mention one other advance in IonPower, which Ben will particularly appreciate if he still follows this blog: full support for all eight bitfields of the condition register. Unfortunately, it's mostly irrelevant to generated code because Ion assumes, much to my disappointment, that the processor possesses only a single set of flags. However, some sections of code that we fully control can now do multiple comparisons in parallel over several condition registers, reducing our heavy dependence upon (and usually hopeless serialization of) cr0, and certain FPU operations that emit to cr1 (or require the FPSCR to dump to it) can now branch directly upon that bitfield instead of having to copy it. Also, emulation of mcrxr on G5/POWER4+ no longer has a hard-coded dependency upon cr7, simplifying much conditional branching code. It's a seemingly minor change that nevertheless greatly helps to further unlock the untapped Power in PowerPC.

Tuesday, February 24, 2015

Two victories

Busted through my bug with stubs late last night (now that I've found the bug I am chagrined at how I could have been so dense) and today IonPower's Baseline implementation successfully computed π to an arbitrary number of iterations using the nice algorithm by Daniel Pepin:

% ../../../obj-ff-dbg/dist/bin/js --no-ion --baseline-eager -e 'var pi=4,top=4,bot=3,minus = true;next(pi,top,bot,minus,30);function next(pi,top,bot,minus,num){for(var i=0;i<num;i++){pi += (minus == true)?-(top/bot):(top/bot);minus = \!minus;bot+=2;}print(pi);}'
3.1738423371907505
% ../../../obj-ff-dbg/dist/bin/js --no-ion --baseline-eager -e 'var pi=4,top=4,bot=3,minus = true;next(pi,top,bot,minus,30000);function next(pi,top,bot,minus,num){for(var i=0;i<num;i++){pi += (minus == true)?-(top/bot):(top/bot);minus = \!minus;bot+=2;}print(pi);}'
3.141625985812036

Still work to be done on the rudiments before attacking the test suite, but code of this complexity running correctly so far is a victory. And, in a metaphysical sense, speaking from my perspective as a Christian (and a physician aware of the nature of his illness), here is another victory: a Mozillian's last post from the end stages of his affliction. Even for those who do not share that religious perspective, it is a truly brave final statement and one I have not seen promoted enough.

Saturday, February 21, 2015

Biggus diskus (plus: 31.5.0 and how to superphish in your copious spare time)

One of the many great advances that Mac OS 9 had over the later operating system was the extremely flexible (and persistent!) RAM disk feature, which I use on almost all of my OS 9 systems to this day as a cache store for Classilla and temporary work area. It's not just for laptops!

While OS X can configure and use RAM disks, of course, it's not as nicely integrated as the RAM Disk in Classic is and it isn't natively persistent, though the very nice Esperance DV prefpane comes pretty close to duplicating the earlier functionality. Esperance will let you create a RAM disk up to 2GB in size, which for most typical uses of a transient RAM disk (cache, scratch volume) would seem to be more than enough, and can back it up to disk when you exit. But there are some heavy duty tasks that 2GB just isn't enough for -- what if you, say, wanted to compile a PowerPC fork of Firefox in one, he asked nonchalantly, picking a purpose at random not at all intended to further this blog post?

The 2GB cap actually originates from two specific technical limitations. The first applies to G3 and G4 systems: they can't have more than 2GB total physical RAM anyway. Although OS X RAM disks are "sparse" and only actually occupy the amount of RAM needed to store their contents, if you filled up a RAM disk with 2GB of data even on a 2GB-equipped MDD G4 you'd start spilling memory pages to the real hard disk and thrashing so badly you'd be worse off than if you had just used the hard disk in the first place. The second limit applies to G5 systems too, even in Leopard -- the RAM disk is served by /System/Library/PrivateFrameworks/DiskImages.framework/Resources/diskimages-helper, a 32-bit process limited to a 4GB address space minus executable code and mapped-in libraries (it didn't become 64-bit until Snow Leopard). In practice this leaves exactly 4629672 512-byte disk blocks, or approximately 2.26GB, as the largest possible standalone RAM disk image on PowerPC. A full single-architecture build of TenFourFox takes about 6.5GB. Poop.

It dawned on me during one of my careful toilet thinking sessions that the way awound, er, around this pwobproblem was a speech pathology wefewwal to RAID volumes together. I am chagrined that others had independently came up with this idea before, but let's press on anyway. At this point I'm assuming you're going to do this on a G5, because doing this on a G4 (or, egad, G3) would be absolutely nuts, and that your G5 has at least 8GB of RAM. The performance improvement we can expect depends on how the RAM disk is constructed (10.4 gives me the choices of concatenated, i.e., you move from component volume process to component volume process as they fill up, or striped, i.e., the component volume processes are interleaved [RAID 0]), and how much the tasks being performed on it are limited by disk access time. Building TenFourFox is admittedly a rather CPU-bound task, but there is a non-trivial amount of disk access, so let's see how we go.

Since I need at least 6.5GB, I decided the easiest way to handle this was 4 2+GB images (roughly 8.3GB all told). Obviously, the 8GB of RAM I had in my Quad G5 wasn't going to be enough, so (an order to MemoryX and) a couple days later I had a 16GB memory kit (8 x 2GB) at my doorstep for installation. (As an aside, this means my quad is now pretty much maxed out: between the 16GB of RAM and the Quadro FX 4500, it's now the most powerful configuration of the most powerful Power Mac Apple ever made. That's the same kind of sheer bloodymindedness that puts 256MB of RAM into a Quadra 950.)

Now to configure the RAM disk array. I ripped off a script from someone on Mac OS X Hints and modified it to be somewhat more performant. Here it is (it's a shell script you run in the Terminal, or you could use Platypus or something to make it an app; works on 10.4 and 10.5):

% cat ~/bin/ramdisk
#!/bin/sh

/bin/test -e /Volumes/BigRAM && exit

diskutil erasevolume HFS+ r1 \
        `hdiutil attach -nomount ram://4629672` &
diskutil erasevolume HFS+ r2 \
        `hdiutil attach -nomount ram://4629672` &
diskutil erasevolume HFS+ r3 \
        `hdiutil attach -nomount ram://4629672` &
diskutil erasevolume HFS+ r4 \
        `hdiutil attach -nomount ram://4629672` &
wait
diskutil createRAID stripe BigRAM HFS+ \
        /Volumes/r1 /Volumes/r2 /Volumes/r3 /Volumes/r4

Notice that I'm using stripe here -- you would substitute concat for stripe above if you wanted that mode, but read on first before you do that. Open Disk Utility prior to starting the script and watch the side pane as it runs if you want to understand what it's doing. You'll see the component volume processes start, reconfigure themselves, get aggregated, and then the main array come up. It's sort of a nerdily beautiful disk image ballet.

One complication, however, is you can't simply unmount the array and expect the component RAM volumes to go away by themselves; instead, you have to go seek and kill the component volumes first and then the array will go away by itself. If you fail to do that, you'll run out of memory verrrrry quickly because the RAM will not be reclaimed! Here's a script for that too. I haven't tested it on 10.5, but I don't see why it wouldn't work there either.

% cat ~/bin/noramdisk
#!/bin/sh

/bin/test -e /Volumes/BigRAM || exit

diskutil unmountDisk /Volumes/BigRAM
diskutil checkRAID BigRAM | tail -5 | head -4 | \
        cut -c 3-10 | grep -v 'Unknown' | \
        sed 's/s3//' | xargs -n 1 diskutil eject

This script needs a little explanation. What it does is unmount the RAM disk array so it can be modified, then goes through the list of its component processes, isolates the diskn that backs them and ejects those. When all the disk array's components are gone, OS X removes the array, and that's it. Naturally shutting down or restarting will also wipe the array away too.

(If you want to use these scripts for a different sized array, adjust the number of diskutil erasevolume lines in the mounter script, and make sure the last line has the right number of images [like /Volumes/r1 /Volumes/r2 by themselves for a 2-image array]. In the unmounter script, change the tail and head parameters to 1+images and images respectively [e.g., tail -3 | head -2 for a 2-image array].)

Since downloading the source code from Mozilla is network-bound (especially on my network), I just dumped it to the hard disk, and patched it on the disk as well so a problem with the array would not require downloading and patching everything again. Once that was done, I made a copy on the RAM disk with hg clone esr31g /Volumes/BigRAM/esr31g and started the build. My hard disk, for comparison, is a 7200rpm 64MB buffer Western Digital SATA drive; remember that all PowerPC OS X-compatible controllers only support SATA I. Here's the timings, with the Quad G5 in Highest performance mode:

hard disk: 2 hours 46 minutes
concatenated: 2 hours 15 minutes (18.7% improvement)
striped: 2 hours 8 minutes (22.9% improvement)

Considering how much of this is limited by the speed of the processors, this is a rather nice boost, and I bet it will be even faster with unified builds in 38ESR (these are somewhat more disk-bound, particularly during linking). Since I've just saved almost two hours of build time over all four CPU builds, this is the way I intend to build TenFourFox in the future.

The 5.2% delta observed here between striping and concatenation doesn't look very large, but it is statistically significant, and actually the difference is larger than this test would indicate -- if our task were primarily disk-bound, the gulf would be quite wide. The reason striping is faster here is because each 2GB slice of the RAM disk array is an independent instance of diskimages-helper, and since we have four slices, each slice can run on one of the Quad's cores. By spreading disk access equally among all the processes, we share it equally over all the processors and achieve lower latency and higher efficiencies. This would probably not be true if we had fewer cores, and indeed for dual G5s two slices (or concatenating four) may be better; the earliest single processor G5s should almost certainly use concatenation only.

Some of you will ask how this compares to an SSD, and frankly I don't know. Although I've done some test builds in an SSD, I've been using a Patriot Blaze SATA III drive connected to my FW800 drive toaster to avoid problems with interfacing, so I doubt any numbers I'd get off that setup would be particularly generalizable and I'd rather use the RAM disk anyhow because I don't have to worry about TRIM, write cycles or cleaning up. However, I would be very surprised if an SSD in a G5 achieved speeds faster than RAM, especially given the (comparatively, mind you) lower SATA bandwidth.

And, with that, 31.5.0 is released for testing (release notes, hashes, downloads). This only contains ESR security/stability fixes; you'll notice the changesets hash the same as 31.4.0 because they are, in fact, the same. The build finalizes Monday PM Pacific as usual.

31.5.0 would have been out earlier (experiments with RAM disks notwithstanding) except that I was waiting to see what Mozilla would do about the Superfish/Komodia debacle: the fact that Lenovo was loading adware that MITM-ed HTTPS connections on their PCs ("Superfish") was bad enough, but the secret root certificate it possessed had an easily crackable private key password allowing a bad actor to create phony certificates, and now it looks like the company that developed the technology behind Superfish, Komodia, has by their willful bad faith actions caused the same problem to exist hidden in other kinds of adware they power.

Assuming you were not tricked into accepting their root certificate in some other fashion (their nastyware doesn't run on OS X and near as I can tell never has), your Power Mac is not at risk, but these kinds of malicious, malfeasant and incredibly ill-constructed root certificates need to be nuked from orbit (as well as the companies that try to sneak them on user's machines; I suggest napalm, castration and feathers), and they will be marked as untrusted in future versions of TenFourFox and Classilla so that false certificates signed with them will not be honoured under any circumstances, even by mistake. Unfortunately, it's also yet another example of how the roots are the most vulnerable part of secure connections (previously, previously).

Development on IonPower continues. Right now I'm trying to work out a serious bug with Baseline stubs and not having a lot of luck; if I can't get this working by 38.0, we'll ship 38 with PPCBC (targeting a general release by 38.0.2 in that case). But I'm trying as hard as I can!

Saturday, February 14, 2015

IonPower now entering low Earth orbit

Happy Valentine's Day. I'm spending mine in front of the computer with a temporary filling on my back top left molar. But I'm happy, because IonPower, the new PowerPC JIT I plan to release with TenFourFox 38, is now to the point where it can compile and execute simple scripts in Baseline mode (i.e., phase 3, where 1 was authoring, 2 was compiling, 3 is basic operations, 4 is pass tests in Baseline, 5 is basic operations in full Ion and 6 is pass tests in Ion). For example, tonight's test was debugging and fixing this fun script:

function ok(){print("ok");}function ok3(){ok(ok(ok()));}var i=0;for(i=0;i<12;i++){ok3();}

That's an awful lot of affirmation right there. Once I'm confident I can scale it up a bit more, then we'll try to get the test suite to pass.

What does IonPower bring to TenFourFox, besides of course full Ion JIT support for the first time? Well, the next main thing it does is to pay back our substantial accrued technical debt. PPCBC, the current Baseline-only implementation, is still at its core using the same code generator and assumptions about macroassembler structure from JaegerMonkey and, ultimately, TenFourFox 10.x; I got PPCBC working by gluing the new instruction generator to the old one and writing just enough of the new macroassembler to enable it to build and generate code.

JaegerMonkey was much friendlier to us because our implementation was always generating (from the browser's view) ABI-compliant code; JavaScript frames existed on a separate interpreter stack, so we could be assured that the C stack was sane. Both major PowerPC ABIs (PowerOpen, which OS X and AIX use, and SysV, which Linux, *BSD, et al. use) have very specific requirements for how stack frames are formatted, but by TenFourFox 17 I had all the edge cases worked out and we knew that the stack would always be in a compliant state at any crossing from generated code back to the browser or OS. Ben Stuhl had done some hard work on branch and call stanza optimization and this mostly "just worked" because we had full control to that level.

Ion (and Baseline, its simpleton sibling) destroyed all that. Now, script frames are built on the C stack instead, and Ion has its own stack frame format which it always expects to be on top, meaning when you enter Ion generated code you can't directly return to ABI-compliant code without some sort of thunk and vice versa. This frame has a descriptor and a return address, which is tricky on PowerPC because we don't store return addresses on the stack like x86, the link register we do have (the register storing where a subroutine call should return to) is not a general purpose register and has specialized means for controlling it -- part of the general class of PowerPC SPRs, or special purpose registers -- and the program counter (the internal register storing the location of the current instruction) is not directly accessible in any fashion. Compare this with MIPS, where the link register is a GPR ($ra), and ARM, where both the LR and PC are. Those CPUs can just sling them to and from the stack at will.

That return address is problematic in another respect. We use constructs called branch stanzas in the PowerPC port, because the PPC branch instructions have a limited signed displacement of (at most, for b) 26 bits, and branches may exceed that with the size of scripts these days; these stanzas are padded with nop instructions so that we have something to patch with long calls (usually lis ori mtctr bctr) later. In JagerMonkey (and PPCBC), if we were making a subroutine call and the call was "short," the bl instruction that performed it was at the top of the branch stanza with some trailing nops; JaegerMonkey didn't care how we managed the return as long as we did. Because we had multiple branch variants for G3/G4 and G5, the LR which bl set could be any number of instructions after the branch within the stanza, but it didn't matter because the code would continue regardless. To Ion, though, it matters: it expects that return address on the stack to always correlate to where a translated JSOP (JavaScript interpreter opcode) begins -- no falling between the cracks.

IonPower deals with the problem in two ways. First, we now use a unified stanza for all architectures, so it always looks the same and is predictable, and short branches are now at the end of the stanza so that we always return to the next JSOP. Second, we don't actually use the link register for calling generated code in most cases. In both PowerPC ABIs, because LR is only set when it gets to the callee (the routine being called) the callee is expected to save it in its function prologue, but we don't have a prologue for Baseline code. PPCBC gets around this by using the link register within Baseline code only; for calls out to other generated code it would clobber the link register with a bl .+4 to get the PC, compute an offset and store that as a return address. However, that offset was sometimes wrong (see above), requiring hacks in the Ion core to do secondary verification. Instead, since we know the address of the instruction for the trailing JSOP at link time, we just generate code to push a 32-bit literal and we patch those push instructions when we link to memory to the right location. This executes faster, too, because we don't have all that mucking about with SPR operations. We only use the link register for toggled calls (because we control the code it calls, so we can capture LR there) and for bailout tables, and of course for the thunk code that calls ABI-compliant library routines.

This means that a whole class of bugs, like branching, repatching and stack frame glitches, just disappear in IonPower, whereas with PPCBC there seemed to be no good way to get it to work with the much more sophisticated Ion JIT and its expectations on branching and patching. For that reason, I simply concentrated on optimizing PPCBC as much as possible with high-performance PowerPC-specific parallel type guards and arithmetic routines in Baseline inline caches, and that's what you're using now.

IonPower's other big step forward is improved debugging abilities, and chief amongst them is no compiler warnings -- at all. Yep. No warnings, at least not from our headers, so that subtle bugs that the compiler might warn about can now be detected. We also have a better code generation display that's easier to visually parse, and annoying pitfalls in PowerPC instructions where our typical temporary register r0 does something completely different than other registers now assert at code generation time, making it easier to find these bugs during codegen instead of labouriously going instruction by instruction through the debugger at runtime.

Incidentally, this brought up an interesting discussion -- how does, say, MIPS deal with the return address problem, since it has an LR but no GPR PC? The MIPS Ion backend, for lack of a better word, cheats. MIPS has a single instruction branch delay slot, i.e., the instruction following any branch is (typically) executed before the branch actually takes place. If this sounds screwy, you're right; ARM and PowerPC don't have them, but some other RISC designs like MIPS, SuperH and SPARC still do as a historical holdover (in older RISC implementations, the delay slot was present to allow the processor enough time to fetch the branch target, and it remains for compatibility). With that in mind, and the knowledge that $ra is the MIPS link register, what do you think this does?

addiu $sp,$sp,-8 ; subtract 8 from the current stack pointer,
                 ; reserving two words on the stack
jalr $a0         ; jump to address in register $a0 and set $ra
sw $ra, 0($sp)   ; (DELAY SLOT) store $ra to the top of the stack

What's the value of $ra in the delay slot? This isn't well-documented apparently, but it's actually already set to the correct return address before the branch actually takes place, so MIPS can save the right address to the stack right here. (Thanks to Kyle Isom, who confirmed this on his own MIPS system.) It must be relatively standard between implementations, but I'd never seen a trick like that before. I guess there has to be at least some advantage to still having branch delay slots in this day and age.

One final note: even though necessarily I had to write some of it for purposes of compilation, please note that there will be no asm.js for PowerPC. Almost all the existing asm.js code out there assumes little endian byte ordering, meaning it will either not run or worse run incorrectly on our big endian machines. It's just not worth the headache, so they will run in the usual JIT.

On the stable branch side, 31.5 will be going to build next week. Dan DeVoto commented that TenFourFox does not colour manage untagged images (unless you force it to). I'd argue this is correct behaviour, but I can see where it would be unexpected, and while I don't mind changing the default I really need to do more testing on it. This might be the standard setting for 31.6, however. Other than the usual ESR fixes there are no major changes in 31.5 because I'm spending most of my time (that my Master's degree isn't soaking up) on IonPower -- I'm still hoping we can get it in for 38 because I really do believe it will be a major leap forward for our vintage Power Macs.

Tuesday, January 27, 2015

And now for something completely different: the Pono Player review and Power Macs (plus: who's really to blame for Dropbox?)

Regular business first: this is now a syndicated blog on Planet Mozilla. I consider this an honour that should also go a long way toward reminding folks that not only are there well-supported community tier-3 ports, but lots of people still use them. In return I promise not to bore the punters too much with vintage technology.

IonPower crossed phase 2 (compilation) yesterday -- it builds and links, and nearly immediately asserts after some brief codegen, but at this phase that's entirely expected. Next, phase 3 is to get it to build a trivial script in Baseline mode ("var i=0") and run to completion without crashing or assertions, and phase 4 is to get it to pass the test suite in Baseline-only mode, which will make it as functional as PPCBC. Phase 5 and 6 are the same, but this time for Ion. IonPower really repays most of our technical debt -- no more fragile glue code trying to keep the JaegerMonkey code generator working, substantially fewer compiler warnings, and a lot less hacks to the JIT to work around oddities of branching and branch optimization. Plus, many of the optimizations I wrote for PPCBC will transfer to IonPower, so it should still be nearly as fast in Baseline-only mode. We'll talk more about the changes required in a future blog post.

Now to the Power Mac scene. I haven't commented on Dropbox dropping PowerPC support (and 10.4/10.5) because that's been repeatedly reported by others in the blogscene and personally I rarely use Dropbox at all, having my own server infrastructure for file exchange. That said, there are many people who rely on it heavily, even a petition (which you can sign) to bring support back. But let's be clear here: do you really want to blame someone? Do you really want to blame the right someone? Then blame Apple. Apple dropped PowerPC compilation from Xcode 4; Apple dropped Rosetta. Unless you keep a 10.6 machine around running Xcode 3, you can't build (true) Universal binaries anymore -- let alone one that compiles against the 10.4 SDK -- and it's doubtful Apple would let such an app (even if you did build it) into the App Store because it's predicated on deprecated technology. Except for wackos like me who spend time building PowerPC-specific applications and/or don't give a flying cancerous pancreas whether Apple finds such work acceptable, this approach already isn't viable for a commercial business and it's becoming even less viable as Apple actively retires 10.6-capable models. So, sure, make your voices heard. But don't forget who screwed us first, and keep your vintage hardware running.

That said, I am personally aware of someoneTM who is working on getting the supported Python interconnect running on OS X Power Macs, and it might be possible to rebuild Finder integration on top of that. (It's not me. Don't ask.) I'll let this individual comment if he or she wants to.

Onto the main article. As many of you may or may not know, my undergraduate degree was actually in general linguistics, and all linguists must have (obviously) some working knowledge of acoustics. I've also been a bit of a poseur audiophile too, and while I enjoy good music I especially enjoy good music that's well engineered (Alan Parsons is a demi-god).

The Por Pono Player, thus, gives me pause. In acoustics I lived and died by the Nyquist-Shannon sampling theorem, and my day job today is so heavily science and research-oriented that I really need to deal with claims in a scientific, reproducible manner. That doesn't mean I don't have an open mind or won't make unusual decisions on a music format for non-auditory reasons. For example, I prefer to keep my tracks uncompressed, even though I freely admit that I'm hard pressed to find any difference in a 256kbit/s MP3 (let alone 320), because I'd like to keep a bitwise exact copy for archival purposes and playback; in fact, I use AIFF as my preferred format simply because OS X rips directly to it, everything plays it, and everything plays it with minimum CPU overhead despite FLAC being lossless and smaller. And hard disks are cheap, and I can convert it to FLAC for my Sansa Fuze if I needed to.

So thus it is with the Por Pono Player. For $400, you can get a player that directly pumps uncompressed, high-quality remastered 24-bit audio at up to 192kHz into your ears with no downsampling and allegedly no funny business. Immediately my acoustics professor cries foul. "Cameron," she says as she writes a big fat F on this blog post, "you know perfectly well that a CD using 44.1kHz as its sampling rate will accurately reproduce sounds up to 22.05kHz without aliasing, and 16-bit audio has indistinguishable quantization error in multiple blinded studies." Yes, I know, I say sheepishly, having tried to create high-bit rate digital playback algorithms on the Commodore 64 and failed because the 6510's clock speed isn't fast enough to pump samples through the SID chip at anything much above telephone call frequencies. But I figured that if there was a chance, if there was anything, that could demonstrate a difference in audio quality that I could uncover it with a Pono Player and a set of good headphones (I own a set of Grado SR125e cans, which are outstanding for the price). So I preordered one and yesterday it arrived, in a fun wooden box:

It includes a MicroUSB charger (and cable), an SDXC MicroSD card (64GB, plus the 64GB internal storage), a fawning missive from Neil Young, the instigator of the original Kickstarter, the yellow triangular unit itself (available now in other colours), and no headphones (it's BYO headset):

My original plan was to do an A-B comparison with Pink Floyd's Dark Side of the Moon because it was originally mastered by the godlike Alan Parsons, I have the SACD 30th Anniversary master, and the album is generally considered high quality in all its forms. When I tried to do that, though, several problems rapidly became apparent:

First, the included card is SDXC, and SDXC support (and exFAT) wasn't added to OS X until 10.6.4. Although you can get exFAT support on 10.5 with OSXFUSE, I don't know how good their support is on PowerPC and it definitely doesn't work on Tiger (and I'm not aware of a module for the older MacFUSE that does run on Tiger). That limits you to SDHC cards up to 32GB at least on 10.4, which really hurts on FLAC or ALAC and especially on AIFF.

Second, the internal storage is not accessible directly to the OS. I plugged in the Pono Player to my iMac G4 and it showed up in System Profiler, but I couldn't do anything with it. The 64GB of internal storage is only accessible to the music store app, which brings us to the third problem:

Third, the Pono Music World app (a skinned version of JRiver Media Center) is Intel-only, 10.6+. You can't download tracks any other way right now, which also means you're currently screwed if you use Linux, even on an Intel Mac. And all they had was Dark Side in 44.1kHz/16 bit ... exactly the same as CD!

So I looked around for other options. HDTracks didn't have Dark Side, though they did have The (weaksauce) Endless River and The Division Bell in 96kHz/24 bit. I own both of these, but 96kHz wasn't really what I had in mind, and when I signed up to try a track it turns out they need a downloader also which is also a reskinned JRiver! And their reasoning for this in the FAQ is total crap.

Eventually I was able to find two sites that offer sample tracks I could download in TenFourFox (I had to downsample one for comparison). The first offers multiple formats in WAV, which your Power Mac actually can play, even in 24-bit (but it may be downsampled for your audio chip; if you go to /Applications/Utilities/Audio MIDI Setup.app you can see the sample rate and quantization for your audio output -- my quad G5 offers up to 24/96kHz but my iMac only has 16/44.1). The second was in FLAC, which Audacity crashed trying to convert, MacAmp Lite X wouldn't even recognize, and XiphQT (via QuickTime) played like it was being held underwater by a chainsaw (sample size mismatch, no doubt); I had to convert this by hand. I then put them onto a SDHC card and installed it in the Pono.

Yuck. I was very disappointed in the interface and LCD. I know that display quality wasn't a major concern, but it looks clunky and ugly and has terrible angles (see for yourself!) and on a $400 device that's not acceptable. The UI is very slow sometimes, even with the hardware buttons (just volume and power, no track controls), and the touch screen is very low quality. But I duly tried the built-in Neil Young track, which being an official Por Pono track turns on a special blue light to tell you it's special, and on my Grados it sounded pretty good, actually. That was encouraging. So I turned off the display and went through a few cycles of A-B testing with a random playlist between the two sets of tracks.

And ... well ... my identification abilities were almost completely statistical chance. In fact, I was slightly worse than chance would predict on the second set of tracks. I can only conclude that Harry Nyquist triumphs. With high quality headphones, presumably high quality DSPs and presumably high quality recordings, it's absolutely bupkis difference for me between CD-quality and Pono-quality.

Don't get me wrong: I am happy to hear that other people are concerned about the deficiencies in modern audio engineering -- and making it a marketable feature. We've all heard the "loudness war," for example, which dramatically compresses the dynamic range of previously luxurious tracks into a bafflingly small amplitude range which the uncultured ear, used only to quantity over quality, apparently prefers. Furthermore, early CD masters used RIAA equalization, which overdrove the treble and was completely unnecessary with digital audio, though that grave error hasn't been repeated since at least 1990 or earlier. Fortunately, assuming you get audio engineers who know what they're doing, a modern CD is every bit as a good to the human ear as a DVD-Audio disc or an SACD. And if modern music makes a return to quality engineering with high quality intermediates (where 24-bit really does make a difference) and appropriate dynamic range, we'll all be better off.

But the Pono Player doesn't live up to the hype in pretty much any respect. It has line out (which does double as a headphone port to share) and it's high quality for what it does play, so it'll be nice for my hi-fi system if I can get anything on it, but the Sansa Fuze is smaller and more convenient as a portable player and the Pono's going back in the wooden box. Frankly, it feels like it was pushed out half-baked, it's problematic if you don't own a modern Mac, and the imperceptible improvements in audio mean it's definitely not worth the money over what you already own. But that's why you read this blog: I just spent $400 so you don't have to.

Monday, January 19, 2015

Upgrading the unupgradeable: video card options for the Quad G5

Now that the 2015 honeymoon and hangovers are over, it's back to business, including the annual retro-room photo spread (check out the new pictures of the iMac G3, the TAM and the PDP-11/44). And, as previously mentioned on my ripping yarn about long-life computing -- by this way, this winter the Quad G5's cores got all the way down to 30 C on the new CPU assembly, which is positively arctic -- 2015 is my year for a hard disk swap. I was toying with getting an apparently Power Mac compatible Seagate hybrid SSHD that Martin Kukač was purchasing (perhaps he'll give his capsule review in the comments or on his blog?), but I couldn't find out if it failed gracefully to the HD when the flash eventually dies, and since I do large amounts of disk writes for video and development I decided to stick with a spinning disk. The Quad now has two 64MB-buffer 7200rpm SATA II Western Digital drives and the old ones went into storage as desperation backups; while 10K or 15Krpm was a brief consideration, their additional heat may be problematic for the Quad (especially with summers around here) and I think I'll go with what I know works. Since I'm down to only one swap left I think I might stretch the swap interval out to six years, and that will get me through 2027.

At the same time I was thinking of what more I could do to pump the Quad up. Obviously the CPU is a dead-end, and I already have 8GB of RAM in it, which Tiger right now indicates I am only using 1.5GB of (with TenFourFox, Photoshop, Terminal, Texapp, BBEdit and a music player open) -- I'd have to replace all the 1GB sticks with 2GB sticks to max it out, and I'd probably see little if any benefit except maybe as file cache. So I left the memory alone; maybe I'll do it for giggles if G5 RAM gets really cheap.

However, I'd consolidated the USB and FireWire PCIe cards into a Sonnet combo card, so that freed up a slot and meant I could think about the video card. When I bought my Quad G5 new I dithered over the options: the 6600LE, 7800GT and 2-slot Quadro FX 4500, all NVIDIA. I prefer(red) ATIAMD in general because of their long previous solid support for the classic Mac OS, but Apple only offered NVIDIA cards as BTO options at the time. The 6600LE's relatively anaemic throughput wasn't ever in the running, and the Quadro was incredibly expensive (like, 4x the cost!) for a marginal increase in performance in typical workloads, so I bought the 7800GT. Overall, it's been a good card; other than the fan failing on me once, it's been solid, and prices on G5-compatible 7800GTs are now dropping through the floor, making it a reasonably inexpensive upgrade for people still stuck on a 6600. (Another consideration is the aftermarket ATI X1900 GT, which is nearly as fast as the 7800GT.)

However, that also means that prices on other G5-compatible video cards are also dropping through the floor. Above the 7800GT are two options: the Quadro FX 4500, and various third-party hacked video cards, most notably the 2-slot 7800GTX. The GTX is flashed with a hacked Mac 7800GT ROM but keeps the core and memory clocks at the same high speed, yielding a chimera card that's anywhere between 15-30% faster than the Quadro. I bought one of these about a year and a half ago as a test, and while it was noticeably faster in certain tasks and mostly compatible, it had some severe glitchiness with older games and that was unacceptable to me (for example, No One Lives Forever had lots of flashing polygons and bad distortion). I also didn't like that it didn't come with a support extension to safely anchor it in the G5's card guide, leaving it to dangerously flex out of the card slot, so I pulled it and it's sitting in my junk box while I figure out what to do with it. Note that it uses a different power adapter cable than the 7800 or Quadro, so you'll need to make sure it's included if you want to try this card out, and if you dislike the lack of a card guide extension as much as I do you'll need a sacrificial card to steal one from.

Since then Quadro prices plummeted as well, so I picked up a working-pull used Apple OEM FX 4500 on eBay for about $130. The Quadro has 512MB of GDDR3 VRAM (same as the 7800GTX and double the 7800GT), two dual-link DVI ports and a faster core clock; although it also supports 3D glasses, something I found fascinating, it doesn't seem to work with LCD panels, so I can't evaluate that. Many things are not faster, but some things are: 1080p video playback is now much smoother because the Quadro can push more pixels, and high end games now run more reliably at higher resolutions as you would expect without the glitchiness I got in older titles with the 7800GTX. Indeed, returning to the BareFacts graph, the marginal performance improvement and the additional hardware rendering support is now at least for me worth $130 (I just picked up a spare for $80), it's a fully kitted and certified OEM card (no hacks!), and it uses the same power adapter cable as the 7800GT. One other side benefit is that, counterintuitively, the GPU is several degrees cooler (despite being bigger and beefier) and the fan is nearly inaudible, no doubt due to that huge honking heatsink.

It's not a big bump, but it's a step up, and I'm happy. I guess all that leaves is the RAM ...

In TenFourFox news, I'm done writing IonPower (phase 1). Phase 2 is compilation. That'll be some drudgery, but I think we're on target for release with 38ESR.

Saturday, January 10, 2015

31.4.0 available

31.4.0 is available, with many fixes and the speculative changes from 31.3.1pre. Before any of you wags point out that the copyright date is wrong, I didn't notice this until the middle of the G3 build and since I'd already wasted a day because I missed a patch (build day for TenFourFox is about eight hours, even with the G5 quad running full blast on the SSD I use for build acceleration, so scotching a build really burns up a lot of time), I really don't care enough. :P It's in the changesets for next time.

For 31.5 I'm looking at tweaking the WebM AltiVec code some more, possibly introducing some early speculative fetching. But in the meantime, it's IonPower for the rest of the weekend until my Master's classes start again on Monday (when this build will become final).

Downloads from the usual place.