Saturday, February 21, 2015

Biggus diskus (plus: 31.5.0 and how to superphish in your copious spare time)

One of the many great advances that Mac OS 9 had over the later operating system was the extremely flexible (and persistent!) RAM disk feature, which I use on almost all of my OS 9 systems to this day as a cache store for Classilla and temporary work area. It's not just for laptops!

While OS X can configure and use RAM disks, of course, it's not as nicely integrated as the RAM Disk in Classic is and it isn't natively persistent, though the very nice Esperance DV prefpane comes pretty close to duplicating the earlier functionality. Esperance will let you create a RAM disk up to 2GB in size, which for most typical uses of a transient RAM disk (cache, scratch volume) would seem to be more than enough, and can back it up to disk when you exit. But there are some heavy duty tasks that 2GB just isn't enough for -- what if you, say, wanted to compile a PowerPC fork of Firefox in one, he asked nonchalantly, picking a purpose at random not at all intended to further this blog post?

The 2GB cap actually originates from two specific technical limitations. The first applies to G3 and G4 systems: they can't have more than 2GB total physical RAM anyway. Although OS X RAM disks are "sparse" and only actually occupy the amount of RAM needed to store their contents, if you filled up a RAM disk with 2GB of data even on a 2GB-equipped MDD G4 you'd start spilling memory pages to the real hard disk and thrashing so badly you'd be worse off than if you had just used the hard disk in the first place. The second limit applies to G5 systems too, even in Leopard -- the RAM disk is served by /System/Library/PrivateFrameworks/DiskImages.framework/Resources/diskimages-helper, a 32-bit process limited to a 4GB address space minus executable code and mapped-in libraries (it didn't become 64-bit until Snow Leopard). In practice this leaves exactly 4629672 512-byte disk blocks, or approximately 2.26GB, as the largest possible standalone RAM disk image on PowerPC. A full single-architecture build of TenFourFox takes about 6.5GB. Poop.

It dawned on me during one of my careful toilet thinking sessions that the way awound, er, around this pwobproblem was a speech pathology wefewwal to RAID volumes together. I am chagrined that others had independently came up with this idea before, but let's press on anyway. At this point I'm assuming you're going to do this on a G5, because doing this on a G4 (or, egad, G3) would be absolutely nuts, and that your G5 has at least 8GB of RAM. The performance improvement we can expect depends on how the RAM disk is constructed (10.4 gives me the choices of concatenated, i.e., you move from component volume process to component volume process as they fill up, or striped, i.e., the component volume processes are interleaved [RAID 0]), and how much the tasks being performed on it are limited by disk access time. Building TenFourFox is admittedly a rather CPU-bound task, but there is a non-trivial amount of disk access, so let's see how we go.

Since I need at least 6.5GB, I decided the easiest way to handle this was 4 2+GB images (roughly 8.3GB all told). Obviously, the 8GB of RAM I had in my Quad G5 wasn't going to be enough, so (an order to MemoryX and) a couple days later I had a 16GB memory kit (8 x 2GB) at my doorstep for installation. (As an aside, this means my quad is now pretty much maxed out: between the 16GB of RAM and the Quadro FX 4500, it's now the most powerful configuration of the most powerful Power Mac Apple ever made. That's the same kind of sheer bloodymindedness that puts 256MB of RAM into a Quadra 950.)

Now to configure the RAM disk array. I ripped off a script from someone on Mac OS X Hints and modified it to be somewhat more performant. Here it is (it's a shell script you run in the Terminal, or you could use Platypus or something to make it an app; works on 10.4 and 10.5):

% cat ~/bin/ramdisk
#!/bin/sh

/bin/test -e /Volumes/BigRAM && exit

diskutil erasevolume HFS+ r1 \
        `hdiutil attach -nomount ram://4629672` &
diskutil erasevolume HFS+ r2 \
        `hdiutil attach -nomount ram://4629672` &
diskutil erasevolume HFS+ r3 \
        `hdiutil attach -nomount ram://4629672` &
diskutil erasevolume HFS+ r4 \
        `hdiutil attach -nomount ram://4629672` &
wait
diskutil createRAID stripe BigRAM HFS+ \
        /Volumes/r1 /Volumes/r2 /Volumes/r3 /Volumes/r4

Notice that I'm using stripe here -- you would substitute concat for stripe above if you wanted that mode, but read on first before you do that. Open Disk Utility prior to starting the script and watch the side pane as it runs if you want to understand what it's doing. You'll see the component volume processes start, reconfigure themselves, get aggregated, and then the main array come up. It's sort of a nerdily beautiful disk image ballet.

One complication, however, is you can't simply unmount the array and expect the component RAM volumes to go away by themselves; instead, you have to go seek and kill the component volumes first and then the array will go away by itself. If you fail to do that, you'll run out of memory verrrrry quickly because the RAM will not be reclaimed! Here's a script for that too. I haven't tested it on 10.5, but I don't see why it wouldn't work there either.

% cat ~/bin/noramdisk
#!/bin/sh

/bin/test -e /Volumes/BigRAM || exit

diskutil unmountDisk /Volumes/BigRAM
diskutil checkRAID BigRAM | tail -5 | head -4 | \
        cut -c 3-10 | grep -v 'Unknown' | \
        sed 's/s3//' | xargs -n 1 diskutil eject

This script needs a little explanation. What it does is unmount the RAM disk array so it can be modified, then goes through the list of its component processes, isolates the diskn that backs them and ejects those. When all the disk array's components are gone, OS X removes the array, and that's it. Naturally shutting down or restarting will also wipe the array away too.

(If you want to use these scripts for a different sized array, adjust the number of diskutil erasevolume lines in the mounter script, and make sure the last line has the right number of images [like /Volumes/r1 /Volumes/r2 by themselves for a 2-image array]. In the unmounter script, change the tail and head parameters to 1+images and images respectively [e.g., tail -3 | head -2 for a 2-image array].)

Since downloading the source code from Mozilla is network-bound (especially on my network), I just dumped it to the hard disk, and patched it on the disk as well so a problem with the array would not require downloading and patching everything again. Once that was done, I made a copy on the RAM disk with hg clone esr31g /Volumes/BigRAM/esr31g and started the build. My hard disk, for comparison, is a 7200rpm 64MB buffer Western Digital SATA drive; remember that all PowerPC OS X-compatible controllers only support SATA I. Here's the timings, with the Quad G5 in Highest performance mode:

hard disk: 2 hours 46 minutes
concatenated: 2 hours 15 minutes (18.7% improvement)
striped: 2 hours 8 minutes (22.9% improvement)

Considering how much of this is limited by the speed of the processors, this is a rather nice boost, and I bet it will be even faster with unified builds in 38ESR (these are somewhat more disk-bound, particularly during linking). Since I've just saved almost two hours of build time over all four CPU builds, this is the way I intend to build TenFourFox in the future.

The 5.2% delta observed here between striping and concatenation doesn't look very large, but it is statistically significant, and actually the difference is larger than this test would indicate -- if our task were primarily disk-bound, the gulf would be quite wide. The reason striping is faster here is because each 2GB slice of the RAM disk array is an independent instance of diskimages-helper, and since we have four slices, each slice can run on one of the Quad's cores. By spreading disk access equally among all the processes, we share it equally over all the processors and achieve lower latency and higher efficiencies. This would probably not be true if we had fewer cores, and indeed for dual G5s two slices (or concatenating four) may be better; the earliest single processor G5s should almost certainly use concatenation only.

Some of you will ask how this compares to an SSD, and frankly I don't know. Although I've done some test builds in an SSD, I've been using a Patriot Blaze SATA III drive connected to my FW800 drive toaster to avoid problems with interfacing, so I doubt any numbers I'd get off that setup would be particularly generalizable and I'd rather use the RAM disk anyhow because I don't have to worry about TRIM, write cycles or cleaning up. However, I would be very surprised if an SSD in a G5 achieved speeds faster than RAM, especially given the (comparatively, mind you) lower SATA bandwidth.

And, with that, 31.5.0 is released for testing (release notes, hashes, downloads). This only contains ESR security/stability fixes; you'll notice the changesets hash the same as 31.4.0 because they are, in fact, the same. The build finalizes Monday PM Pacific as usual.

31.5.0 would have been out earlier (experiments with RAM disks notwithstanding) except that I was waiting to see what Mozilla would do about the Superfish/Komodia debacle: the fact that Lenovo was loading adware that MITM-ed HTTPS connections on their PCs ("Superfish") was bad enough, but the secret root certificate it possessed had an easily crackable private key password allowing a bad actor to create phony certificates, and now it looks like the company that developed the technology behind Superfish, Komodia, has by their willful bad faith actions caused the same problem to exist hidden in other kinds of adware they power.

Assuming you were not tricked into accepting their root certificate in some other fashion (their nastyware doesn't run on OS X and near as I can tell never has), your Power Mac is not at risk, but these kinds of malicious, malfeasant and incredibly ill-constructed root certificates need to be nuked from orbit (as well as the companies that try to sneak them on user's machines; I suggest napalm, castration and feathers), and they will be marked as untrusted in future versions of TenFourFox and Classilla so that false certificates signed with them will not be honoured under any circumstances, even by mistake. Unfortunately, it's also yet another example of how the roots are the most vulnerable part of secure connections (previously, previously).

Development on IonPower continues. Right now I'm trying to work out a serious bug with Baseline stubs and not having a lot of luck; if I can't get this working by 38.0, we'll ship 38 with PPCBC (targeting a general release by 38.0.2 in that case). But I'm trying as hard as I can!

3 comments:

  1. I've had temporary crowns, but never a temporary filling. Are you going for a gold filling?
    I enjoy your blog and all have to admit most of it goes over my head.
    Also, i've enjoyed your road sign pictures. My son did a western Pennsylvania sign project you might find interesting: http://studioforcreativeinquiry.org/projects/the-pittsburgh-signs-project

    I was surprised that the G5 can handle 16 gig ram. Had i known, I would have done the same as yo and install 16 instead of eight.
    Thanks again for all your work on Tenfourfox and Classilla.

    ReplyDelete
    Replies
    1. The temporary filling is because I had a root canal done and the crown had to be fitted afterwards.

      Nice work on your son's part. Road signs are a unique historical record.

      I'd argue that 16GB in the G5 is probably incredible overkill in most cases. I would still be using 8GB if I didn't want to try this RAM disk idea.

      Delete
  2. First of all thanks for the link to Esperance DV! Up until now I did not know about this prefpane. Secondly I have to confess that I also used my Dual-Core G5 with 16 GB or memory. I had a tmpfs of about 10 GB set up that I used to compile packages on Gentoo Linux. Yes, with Gentoo Linux you really compile a lot. I figured that it was a good idea to compile in RAM and have the compiled binaries copied to the file system in order to minimize fragmentation and, off course, to speed up the build process itself. With Linux' tmpfs this works great because it only ever occupies actually used data space in RAM.

    On Mac OS X I have the feeling that it really doesn't know what to do with this amount of memory. It simply wasn't built for 16 GB, and on top of that all applications are 32-bit anyway. My favorite OS is still Mac OS X 10.4 Tiger, but I also use 10.5 and regularly play around with my copies of 10.2 and 10.3 on my various Macs.

    I wander if it may be possible to set up a bigger volume with a different (64-bit) utility and format it manually. Or if it may be possible to use fuse and tmpfs from Linux. I coundn't find a tmpfs fuse-port, but there seems to be something similar called memfs. Since fuse is available also for Mac OS X/PowerPC this could be another working solution, I couldn't find a PPC port of memfs though.

    Anyway, thanks for the idea to use RAID volumes to make a bigger RAM Disk, although I doubt I will use this on Mac OS X. I will continue to use the RAM in Linux.

    ReplyDelete

Due to an increased frequency of spam, comments are now subject to moderation.