Saturday, June 16, 2018

TenFourFox FPR8b1 available

TenFourFox Feature Parity Release 8 beta 1 is now available (downloads, release notes, hashes). There is much less in this release than I wanted because of a family member in the hospital and several technical roadblocks. Of note, I've officially abandoned CSS grid again after an extensive testing period due to the fact that we would need substantial work to get a functional implementation, and a partially functional implementation is worse than none at all (in the latter case, we simply gracefully degrade into block-level <div>s). I also was not able to finish the HTML <input> date picker implementation, though I've managed to still get a fair amount completed of it, and I'll keep working on that for FPR9. The good news is, once the date picker is done, the time picker will use nearly exactly the same internal plumbing and can just be patterned off it in the same way. Unlike Firefox's implementation, as I've previously mentioned our version uses native OS X controls instead of XUL, which also makes it faster. That said, it is a ghastly hack on the Cocoa widget side and required some tricky programming on 10.4 which will be the subject of a later blog post.

That's not to say this is strictly a security patch release (though most of the patches for the final Firefox 52, 52.9, are in this beta). The big feature I did want to get in FPR8 did land and seems to work properly, which is same-site cookie support. Same-site cookie support helps to reduce cross-site request forgeries by advising the browser the cookie in question should only be sent if a request originates from the site that set it. If the host that triggered the request is different than the one appearing in the address bar, the request won't include any of the cookies that are tagged as same-site. For example, say you're logged into your bank, and somehow you end up opening another tab with a malicious site that knows how to manipulate your bank's transfer money function by automatically submitting a hidden POST form. Since you're logged into your bank, unless your bank has CSRF mitigations (and it had better!), the malicious site could impersonate you since the browser will faithfully send your login cookie along with the form. The credential never leaked, so the browser technically didn't malfunction, but the malicious site was still able to impersonate you and steal your money. With same-site cookies, there is a list of declared "safe" operations; POST forms and certain other functions are not on that list and are considered "unsafe." Since the unsafe action didn't originate from the site that set the cookie, the cookie isn't transmitted to your bank, authentication fails and the attack is foiled. If the mode is set to "strict" (as opposed to "lax"), even a "safe" action like clicking a link from an outside site won't send the cookie.

Same-site cookie support was implemented for Firefox 60; our implementation is based on it and should support all the same features. When you start FPR8b1, your cookies database will be transparently upgraded to the new database schema. If you are currently logged into a site that supports same-site cookies, or you are using a foxbox that preserves cookie data, you will need to log out and log back in to ensure your login cookie is upgraded (I just deleted all my cookies and started fresh, which is good to give the web trackers a heart attack anyway). Github and Bugzilla already have support, and I expect to see banks and other high-security sites follow suit. To see if a cookie on a site is same-site, make sure the Storage Inspector is enabled in Developer tools, then go to the Storage tab in the Developer tools on the site of interest and look at the Cookies database. The same-site mode (unset, lax or strict) should be shown as the final column.

FPR8 goes live on June 25th.

Saturday, June 9, 2018

Let's kill kittens with native messaging (or, introducing OverbiteNX: if WebExtensions can't do it, we will)

WebExtensions (there is no XUL) took over with a thud seven months ago, which was felt as a great disturbance in the Force by most of us who wrote Firefox add-ons that, you know, actually did stuff. Many promises were made for APIs to allow us to do the stuff we did before. Some of these promises were kept and these APIs have actually been implemented, and credit where credit is due. But there are many that have not (that metabug is not exhaustive). More to the point, there are many for which people have offered to write code and are motivated to write code, but we have no parameters for what would be acceptable, possibly because any spec would end up stuck in a "boil the ocean" problem, possibly because it's low priority, or possibly because someone gave other someones the impression such an API would be acceptable and hasn't actually told them it isn't. The best way to get contribution is to allow people to scratch their own itches, but the urgency to overcome the (largely unintentional) institutional roadblocks has faded now that there is somewhat less outrage, and we are still left with a disordered collection of APIs that extends Firefox relatively little and a very slow road to do otherwise.

Or perhaps we don't have to actually rely on what's in Firefox to scratch our itch, at least in many cases. In a potentially strategically unwise decision, WebExtensions allows native code execution in the form of "native messaging" -- that is, you can write a native component, tell Firefox about it and who can talk to it, and then have that native component do what Firefox don't. At that point, the problem then becomes more one of packaging. If the functionality you require isn't primarily limited by the browser UI, then this might be a way around the La Brea triage tarpit.

Does this sound suspiciously familiar to anyone like some other historical browser-manipulated blobs of native code? Hang on, it's coming back to me. I remember something like this. I remember now. I remember them!


If you've been under a rock until Firefox 52, let me remind you that plugins were globs of native code that gave the browser wonderful additional capabilities such as playing different types of video, Flash and Shockwave games and DRM management, as well as other incredibly useful features such as potential exploitation, instability and sometimes outright crashes. Here in TenFourFox, for a variety of reasons I officially removed support for plugins in version 6 and completely removed the code with TenFourFox 19. Plugins served a historic purpose which are now better met by any number of appropriate browser and HTML5 APIs, and their disadvantages now in general outweigh their advantages.

Mozilla agrees with this, sort of. Starting with 52 many, though not all, plugins won't run. The remaining lucky few are Flash, Widevine and OpenH264. If you type about:plugins into Firefox, you'll still see them unless you're like me and you're running it on a POWER9 Talos II or some other hyper-free OS. There's a little bit of hypocrisy here in the name of utilitarianism, but I think it's pretty clear there would be a pretty high bar to adding a new plugin to this whitelist. Pithily, Mozilla has concluded regardless of any residual utility that plugins kill kittens.

It wasn't just plugins, either. Mozilla really doesn't like native code at all when it can be avoided; for a number of years in the Mozilla developer community I remember a lot of interest in making JavaScript so fast it could be used for all the media decoding and manipulation that native plugins then did. It's certainly gotten a lot faster since then, but now I guess the interest is having WASM do that instead (I'll wait). A certain subset of old-school extensions used to have binary components too, though largely to do funky system-level things that regular XPCOM in JavaScript and/or js-ctypes couldn't handle. This practice was strongly discouraged by Mozilla but many antiviruses implemented browser scanners in that fashion. This may have been a useful means to an end but binary components were a constant source of bugs when Mozilla altered something that the add-on authors didn't expect to change, and thus they kill kittens as well.

Nevertheless, we're back in a situation where we actually need certain APIs to be implemented to make certain types of functionality -- functionality which was perfectly acceptable in the XUL addons era, I might add -- actually still possible. These APIs are not a priority to Mozilla right now, but a certain subset of them can be provided by a (formerly discouraged) native component.

It's time to kill some kittens.

The particular kitten I need to kill is TCP sockets. I need to be able to talk over TCP to Gopher servers directly on a selection of port numbers. This was not easy to implement with XPCOM, but you could implement your own nsIChannel as a first-class citizen that looked to the browser like any other channel in pure JavaScript back in the day, and I did. That was OverbiteFF. I'm not the only one who asked for this feature, and I was willing to write it, but I'm not going to spend a lot of time writing code Mozilla won't accept and that means Mozilla needs to come up with an acceptable spec. This has gradually drowned under the ocean they're trying to boil by coming up with everyone's use cases whether they have anything in common or not and then somehow addressing the security concerns of all these disparate connection models and eventually forging the one ring to bind them all sometime around the natural heatdeath of the observable universe. I'm tired of waiting.

So here is OverbiteNX. OverbiteNX comes in two parts: the actual WebExtensions addon ("OverbiteNX" proper), and Onyx, a native component that the browser add-on asks to connect to a Gopher server and then receives data from it. Onyx is supported on Microsoft Windows, Linux and macOS (I've included binaries for Win32 and macOS 10.12+), and works probably anywhere else you can compile and run Firefox and Onyx as long as it supports Berkeley sockets. Onyx has no dependencies, is written in cross-platform C with a couple Windows-specific bits, and is currently presented in a single source file in an uncomplicated manner to not only serve the purpose of the extension but also to serve as an educational example of how to kill your own particular kitten. All of the pieces needed to hook it up (the Windows NSI and example JSON) are included and you can see how it connects with the front-end and the life cycle of a request in the source code. I discuss the architecture in more detail and how you can play with it. (If you are a current user of OverbiteWX, please remove it first.)

To be sure, I'm not the first to have this idea, and it's hardly an optimal solution. Besides the fact I have to coax someone to install a binary on their system to use the add-on, if I change the communication protocol (which is highly likely given that I will need to add future support for multithreading and possibly session IDs when multiple tabs are in use) then I need to get them to upgrade both the add-on and the native component at the same time. Users on weird platforms, and as a user of a weird platform I certainly empathize, have to do more work to get it to run on their system by compiling and installing it manually. The dat extension I linked to at the beginning of this paragraph gets around the binary code limitation by running the "native" component as JavaScript under Node.js, but that requires you to run Node, which is an extra step that seems unrealistic for many end-users and actually adds more work to people on weird platforms.

On the other hand, this is the only way right now it's going to work at all. I don't think Mozilla is going to move on these pain points unless they get into the situation where half the add-ons on AMO depend on external components they don't manage. So let the kitten killing begin. If you've got an idea and the current APIs don't let you implement it, see if a native component will scratch your itch and increase the pressure on Mountain View. After all, if Mozilla doesn't add useful WebExtensions APIs, there's no alternative but feline mass murder.

Provocative hyperbole aside, bug reports and pull requests appreciated for OverbiteNX. At some point in the near future the add-on piece will be submitted to AMO, which I expect to pass the automatic scanner since it doesn't do any currently proscribed operations, and then it's just a matter of keeping the native component in sync for future releases. Watch the Github project for more. And enjoy Gopherspace. Just don't tell the EU about it.

(Note for the humourless: The cat pictured is my cat. She is purring on her ottoman as this was written. No cat was harmed during the making of this blog post, though she does get an occasional talking-to. The picture was taken when she was penned up in the laundry while I was out and she was not threatened with any projectile weapon.)

Monday, June 4, 2018

Just call it macOS Death Valley and get it over with

What is Apple trying to say with macOS Mojave? That it's dusty, dry, substantially devoid of life in many areas, and prone to severe temperature extremes? Oh, it has dark mode. Ohhh, okay. That totally makes up for everything and all the bugs and all the recurrent lack of technical investment. It's like the anti-shiny.

In other news, besides the 32-bit apocalypse, they just deprecated OpenGL and OpenCL in order to make way for all the Metal apps that people have just been lining up to write. Not that this is any surprise, mind you, given how long Apple's implementation of OpenGL has rotted on the vine. It's a good thing they're talking about allowing iOS apps to run, because there may not be any legacy Mac apps compatible when macOS 10.15 "Zzyzx" rolls around.

Yes, looking forward to that Linux ARM laptop when the MacBook Air "Sierra Forever" wears out. I remember when I was excited about Apple's new stuff. Now it's just wondering what they're going to break next.

Sunday, June 3, 2018

Another weekend on the new computer (or, making the Talos II into the world's biggest Power Mac)

Your eyes do not deceive you -- this is QEMU running Tiger with full virtualization on the Talos II. For proof, look at my QEMU command line in the Terminal window. I've just turned my POWER9 into a G4.

Recall last entry that there was a problem using virtualization to run Power Mac operating systems on the T2 because the necessary KVM module, KVM-PR, doesn't load on bare-metal POWER9 systems (the T2 is PowerNV, so it's bare-metal). That means you'd have to run your Mac operating systems under pure emulation, which eked out something equivalent to a 1GHz G4 in System Profiler but was still a drag. With some minimal tweaks to KVM-PR, I was able to coax Tiger to start up under virtualization, increasing the apparent CPU speed to over 2GHz. Hardly a Quad G5, but that's the fastest Power Mac G4 you'll ever see with the fastest front-side bus on a G4 you'll ever see. Ever. Maximum effort.

The issue on POWER9 is actually a little more complex than I described it (thanks to Paul Mackerras at IBM OzLabs for pointing me in the right direction), so let me give you a little background first. To turn a virtual address into an actual real address, PowerPC and POWER processors prior to POWER9 exclusively used a hash table of page table entries (PTEs or HPTEs, depending on who's writing) to find the correct location in memory. The process in a simplified fashion is thus: given a virtual address, the processor translates it into a key for that block of memory using the segment lookaside buffer (SLB), and then hashes that key and part of the address to narrow it down to two page table entry groups (PTEGs), each containing eight PTEs. The processor then checks those 16 entries for a match. If it's there, it continues, or else it sends a page fault to the operating system to map the memory.

The first problem is that the format of HPTEs changed slightly in POWER9, so this needs to be accommodated if the host CPU does lookups of its own (it does in KVM-HV, but this was already converted for the new POWER9 and thus works already).

The bigger problem, though, is that hash tables can be complex to manage and in the worst case could require a lot of searches to map a page. POWER8 and earlier reduce this cost with the translation lookaside buffer (TLB), used to cache a PTE once it's found. However, the POWER9 has another option called the radix MMU. In this scheme (read the patent if you're bored), the SLB entry for that block of memory now has a radix page table pointer, or RPTP. The RPTP in turn points to a chain of hierarchical translation tables ("radix tree") that through a series of cascading lookups build the real address for that page of RAM. This is fast and flexible, and particularly well-suited to discontinuous tracts of addressing space. However, as an implementational detail, a guest operating system running in user mode (i.e., KVM-PR) on a radix host has limitations on so-called quadrant 3 (memory in the 0xc... range). This isn't a problem for a VM that can execute supervisor instructions (i.e., KVM-HV) because it can just remap as necessary, but KVM-HV can't emulate a G3 or G4 on a POWER9; only KVM-PR can do that.

Fortunately, the POWER9 still can support the HPT and turn the radix MMU off by booting the kernel with disable_radix. That gets around the second problem. As it turns out, the first problem actually isn't a problem for booting OS X on KVM once radix mode is off, assuming you hack the KVM-PR kernel module to handle a couple extra interrupt types and remove the lockout on POWER9. And here we are.(*)

Anyway, you lot will be wanting the Geekbench numbers, won't you? Such a competitive bunch, always demanding to know the score. Let's set two baselines. First, my trusty backup workstation, the 1GHz iMac G4: It's not very fast and it has no L3 cache, which makes it worse, but the arm is great, the form-factor has never been equaled, I love the screen and it fits very well on a desk. That gets a fairly weak 580 Geekbench (Geekbench 2.2 on 10.4, integer 693, floating point 581, memory 500, stream 347). For the second baseline, I'll use my trusty Quad G5, but I left it in Reduced power mode since that's how I normally run it. In Reduced, it gets a decent 1700 Geekbench (1907/2040/1002/1190).

First up, Geekbench with pure emulation (using the TCG JIT):

... aah, forget it. I wasn't going to wait all night for that. How about hacked KVM-PR?

Well, damn, son: 1733 (1849/2343/976/536). That's into the G5's range, at least with math performance, and the G5 did it with four threads while this poor thing only has one (QEMU's Power Mac emulation does not yet support SMP, even with KVM). Again, do remember that the G5 was intentionally being run gimped here: if it were going full blast, it would have blown the World's Baddest Power Mac G4 out of the water. But still, this is a decent showing for the T2 in "Mac mode" given all the other overhead that's going on, and the T2 is doing that while running Firefox with a buttload of tabs and lots of Terminal sessions and I think I was playing a movie or something in the background. I will note for the record that some of the numbers seem a bit suspect; although there may well be a performance delta between image compression and decompression, it shouldn't be this different and it would more likely be in the other direction. Likewise, the poor showing for the standard library memory work might be syscall overhead, which is plausible, but that doesn't explain why a copy is faster than a simple write. Regardless, that's heaps better than the emulated CPU which wouldn't have finished even by the time I went to dinner.

The other nice thing is that KVM-PR-Hacky-McHackface doesn't require any changes to QEMU to work, though the hack is pretty hacky. It is not sufficient to boot Mac OS 9; that causes the kernel module to err out with a failure in memory mapped I/O, which is probably because it actually does need the first problem to be fixed, and similarly I would expect Linux and NetBSD won't be happy either for the same reason (let alone nesting KVM-PR within them, which is allowed and even supported). Also, I/O performance in QEMU regardless of KVM is dismal. Even with my hacked KVM-PR, a raw disk image and rebuilding a stripped down QEMU with -O3 -mcpu=power9, disk and network throughput are quite slow and it's even worse if there are lots of graphics updates occurring simultaneously, such as installing Mac OS X with the on-screen Aqua progress bar. Minimizing such windows helps, but only when you're able to do so, of course. More ominously I'll get occasional soft lockouts in the kernel (though everything keeps running), usually if it's doing heavy disk access, and it acts very strangely with stuff that messes with the hardware such as system updates. For that reason I let Software Update run in emulated mode so that if a bug occurred during the installation, it wouldn't completely hose everything and make the disk image unbootable (which did, in fact, happen the first time I tried to upgrade to 10.4.11). Another unrelated annoyance is that QEMU's emulated video card doesn't offer 16:9 resolutions, which is inconvenient on this 1920x1080 display. I could probably hack that in later.

QEMU also has its own bugs, of course; support for running OS 9/OS X is very much still a work in progress. For example, you'll notice there are no screenshots of the T2 running TenFourFox. That's because it can't. I installed the G3 version and tried running it in QEMU+KVM-PR, and TenFourFox crashed with an illegal instruction fault. So I tried starting it in safe mode on the assumption the JIT was making it unsteady, which seemed to work when it gave me the safe mode window, but then when I tried to start the full browser still crashed with an illegal instruction fault (in a different place). At that point I assumed it was a bug in KVM-PR and tried starting it in pure emulation. This time, TenFourFox crashed the entire emulator (which exited with an illegal instruction fault). I think we can safely conclude that this is a bug in QEMU. I haven't even tried running Classic on it yet; I'm almost afraid to.

Still, this means my T2 is a lot further along at being able to run my Power Mac software. It also means I need to go through and reprogram all my AutoKey remappings to not remap the Command-key combinations when I'm actually in QEMU. That's a pain, but worth it. If enough people are interested in playing with this, I'll go post the diff in a gist on new Microsoft Visual GitHub, but remember it will rock your socks, taint your kernel, (possibly) crash your computer and (definitely) slap yo mama. You'll also need to apply it as a patch to the source code for your current kernel, whatever it is, as I will not post binaries to make you do it your own bad and irresponsible self, but you won't have to wait long as the T2 will build the Linux kernel from scratch and all its relevant modules in about 20 minutes at -j24. Now we're playing with POWER!

What else did I learn this weekend?

  • If you want disable_radix to stick on bootup, just put it in a grub config and Petitboot will pick it up. This is particularly helpful because I still can't figure out what I should be putting in BOOTKERNFW to enable the Radeon card, so Petitboot still comes up with a blank screen unless I pull the VGA disable jumper.

  • amdgpu is still glitchy sometimes. Trying to view very large images in Eye of GNOME actually caused graphics corruption and garbled the mouse pointer. The session quickly became unusable and I had to bail out and restart X11. Viewnior substituted nicely and doesn't have this problem. (I also found it with Hugin when I was trying to find something to view the equirectangular panoramas taken with my Ricoh Theta cameras. Eventually I hacked FreePV into building and that substitutes nicely as well. I gave it an entry in /usr/share/applications so that I could directly view images from the GNOME file manager.)

  • Symlinking xdg-open to open means I can still open most things from the command line with the same OS X command.

  • I had a dickens of a time getting my Android Pixel XL to talk to the T2. GNOME kept throwing MTP errors when it tried to mount it (yes, the phone was in MTP mode). gphoto2 could list the directories in PTP mode, but couldn't actually transfer anything. Even adb shell would disconnect after just a few commands, and adb pull wouldn't even get past enumerating the file list. Eventually I found some apocryphal note that someone fixed their phone by connecting it over USB 2.0 instead of USB 3.0, so I found a USB 2.0 hub, plugged that in, and plugged the Pixel XL into that. Now it works. (The same cable works fine at USB 3.0 speeds on my MacBook Air with the Pixel, so I am assuming this is an issue with the chipset in the T2. But I also think that's Google's bug, not Raptor's or Fedora's.)

  • The freely distributable Linux fonts are improving but still suck, so I transferred my entire font folder over AFP from the G5. The OTF and TTF fonts worked immediately, and Fondu easily converted the DFONT fonts, but I also have a lot of old Mac fonts and even some font suitcases that are pure resource forks. GNOME doesn't know how to transfer those. I could have tried copying them to a FAT volume and extracting the resource forks from the hidden directory, but that seemed icky. After some thought, I went to the Terminal on the G5 and did this instead:

    ls | fgrep -vi '.ttf' | fgrep -vi '.otf' | fgrep -vi '.dfont' | fgrep -vi '.bmap' | perl -ne 'chomp;print"\"$_\"\n"' | xargs -n 1 -I '{}' macbinconv -mac '{}' -mb '/home/spectre/rfont/{}.bin'

    The little snippet of Perl there preserves embedded spaces in the filenames. When I ran it, it turned all the font resources into MacBinary, I transferred the resulting files to the T2, and Fondu converted them as well. Now I have my fonts.

  • Speaking of fonts, I wanted to play with font hinting on an individual font basis and installed Fonts Tweaks for GNOME to do this. I started with my converted Lucida Grande that I use for most of the display theme, and it eliminated the anti-aliasing and made much of the display painfully jaggy. I couldn't for the life of me figure out how to undo it, even after uninstalling Fonts Tweaks, until I found .config/fontconfig/conf.d/ and removed the offending entry.

  • I was able to get my RadioSHARK to play audio, but it required a little setup first. I first had to compile the shark tool for Linux, which fortunately I had preserved as part of radioSH. That worked with little adjustment, but the new tool kept throwing permissions errors and I didn't really want to run it as root. The problem is that it uses libhid to talk to the HID portion of the RadioSHARK and libhid expects to be able to enumerate any USB device it encounters to see if it's really a HID. Eventually I hit on this udev recipe and added myself to a new usb group:

    SUBSYSTEM=="usb", GROUP="usb", MODE="0660"

    Now shark can control it, and I can listen in VLC (using a URL like pulse://alsa_input.usb-Griffin_Technology__Inc._RadioSHARK-00.analog-stereo).

  • Deadpool 2 is pretty funny. Not as good as the first, because there was no way it was going to out-outrageous it, and the gags were a little forced sometimes, but Brolin made a solid Cable and Domino was outstanding. And hey, a Kiwi kid who can act and isn't stereotyped! I even recognized them driving along the Crowsnest Highway (BC 3) near Manning Park before the merge with Trans-Canada 1 in Hope, a beautiful road I have driven many times myself. Plus, the mid-credits scenes made up for all of the movie's holes (he said, carefully avoiding the joke), and be sure to stay put for a musical laugh-out-loud moment at the very end!
Back on the G5 tomorrow for more work on date-time pickers for TenFourFox FPR8. I've abandoned CSS grid again for the time being because my current implementation is actually worse than not trying to render it at all. That's discouraging, but at least it gracefully degrades.

(*) Given that no changes were made to the HPTE format to get KVM-PR to work for OS X, it may not even be necessary to run the kernel with radix mode off. I'll try that this coming weekend at some point.

Monday, May 28, 2018

A weekend on the new computer (or, introducing "TenNineFox")

This Memorial Day weekend I pulled the Sonnet FireWire card from my Quad G5 and put it to sleep. I mean, I put it in sleep mode, and sat down with the Talos II to see if I could get Firefox building and running, and then QEMU (and to see if the G5 would stay asleep for more than a few hours, since I don't need a hot and noisy 230+ watt computer running next to my less hot but noisier 180W one). One glitch with this was switching the KVM away from the G5 caused it to wake up again, so I wrote a little Perl script to fork and log me out, and in the child process run a trivial AppleScript to tell application "Finder" to sleep. Then I could just run that from a remote login session from the T2, and the G5 would peacefully rest at about 20 watts or so.

First was to grab all the updates. This fixed amdgpu for X11 and now I'm running a fully accelerated GNOME desktop on the AMD Radeon Pro WX 7100. I got Sabrent Bluetooth and USB audio dongles, which "just worked" with Linux, and even got VLC to play some Blu-ray movies (as well as VLC can play them, given that BD+ is still not a solved problem). The T2 firmware update to 1.04 also diminished some of the fan hunting I was hearing and while it's still louder than the G5, it's definitely getting better and better. I'm thinking of getting one of the Supermicro "superquiet" PSUs next, since I notice its higher-pitch fan sound more than the case fans. The only hardware glitch still left over is that I can't figure out why Linux won't recognize the Sonnet FireWire/USB PCIe combo card. It should work, the chipsets should be supported. More on that later.

Next was to get working on builds. After most of Saturday spent hacking on it, I'm pleased to note that Firefox 60 will compile on the T2 with a minimal .mozconfig if you apply this patch, this patch and this patch, and chmod -x /usr/bin/ because the Firefox build system insists, nay, demands to use the (useless on PowerPC) gold linker; I don't even know why Fedora bothers installing it. You also need to turn jemalloc off because it barfs on the default PPC64 page size of 64K. The official Fedora 28 build of Firefox 60, which actually does work, apparently cheats a little by disabling tests and WebRTC, part of what those patches address, though I'm uncertain how they got around the jemalloc or WASM signal handlers issue. It runs fully multiprocess and I'm looking at enabling WebGL next. Even though JavaScript in Firefox 60 on the T2 is about twice as slow as the G5 in TenFourFox FPR7 (remember, no JIT), everything else is tremendously faster due to the 32 threads (8 cores, SMT-4 each), the monstrous cache and the 3+ GHz clock speed, so you really only notice it's not quite as fast as it ought to be on pages with a lot of scripts. So imagine what it will be like when I get the POWER9 JIT, I mean, nothing! I said nothing! Pay no attention to the man behind the curtain! If you build Firefox with -O3 -mcpu=power9, you get about a 3-5% speed boost over the -O2 mcpu=power8 Fedora build, which is worth it because it only takes the Talos 20 minutes to build Firefox at -j24 (compared to 3.5 hours with the Quad G5 in Highest performance mode roaring away at -j3). For posterity, here is the .mozconfig I'm currently using, which I intend to refine further:

mk_add_options MOZ_MAKE_FLAGS="-j24"
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-O3 -mcpu=power9"
ac_add_options --disable-jemalloc

You may call this the first build of "TenNineFox" if you like. :) Some mochitests fail which I'm investigating, but the test suite can run. By the way, Firefox Containers is awesome. I like to segregate my higher-security items like billpay and banking from the browser, which I use a TenFourFoxBox for on the G5, but with a Container it's integrated into the same browser instance and still keeps the cookies and data separate. Cool stuff.

On to QEMU. QEMU will build relatively uneventfully from source, or you can pre-install the Fedora package if you're lazy. Using the generic Power Mac profile mac99 both MacOS 9.1 and 10.4 start up largely happily under qemu-system-ppc, though there is an odd glitch with 9.1 where I have to quadruple-click on anything to get it detected as a double-click. However, while it was certainly useable, it didn't feel very fast. The System Profiler within the emulated Tiger instance said it was a "1GHz G4" with a "400MHz FSB." This seemed low, and the reason it is was ... drumroll please ... it was running with CPU emulation.

After some checking, I confirmed KVM was indeed installed on this system, so I tried running a 64-bit guest with qemu-system-ppc64 emulating an IBM pSeries machine with KVM-HV. That started up and ran at a nice clip, noticeably faster when I turned on KVM, so I tried to run the 32-bit guest with KVM-PR (which ought to emulate the proper CPU) and got an error message. Even the 64-bit guest that ran just dandy with KVM-HV wouldn't run with KVM-PR. Some digging determined that the KVM-PR kernel module existed, but did not load. Some more digging turned up that KVM-PR wouldn't load with modprobe. Even more digging turned up that ... KVM-PR doesn't run on bare-metal POWER9 yet, and unfortunately all PowerNV machines like the T2 are bare-metal.

This is a bummer, but it sounds like an eventually solveable problem. In the meantime, QEMU's performance as a Power Mac emulator is currently acceptable on the T2, just unspectacular. I'll be setting up an install of OS 9 to start with and getting some of my old software loaded into a workspace, and possibly hacking QEMU to autorelease the mouse and switch workspaces with a key combination so I can just jump back and forth easily. When the issues with KVM-PR are ironed out, then everything should "just work," just faster.

For yuks, I tried installing a couple earlier emulation efforts. SheepShaver is the one most people know, and it will compile (if you update config.{guess,sub} and tell configure to use the PowerPC emulator; it will not run natively), but it will not start. Even with sudo sysctl vm.mmap_min_addr=0 and sudo setsebool -P mmap_low_allowed 1 to get the kernel and SELinux to allow its unusual memory mapping requirements, it threw an error message saying it could not allocate enough memory and unceremoniously aborted. On a 32GB system trying to emulate a 256MB Mac a low-memory state seemed unlikely, so I'm guessing this may be a 64-bit bug. I then tried the other well-known Power Mac emulator, PearPC. This also required a new autoconf and a number of hacks to get it to build with current releases of gcc, but it does work, and it does start, and it's even worse, about 20% the speed of QEMU. The reason for this is that QEMU actually has a trivial JIT (TCG), while PearPC is a strict interpreter on systems that don't have JIT support, so while you could do stuff it felt like a 601 was running it instead.

The other parts of the weekend was figuring out what I needed to port over, and how to make the Talos happy on my highly Mac-centric network. Installing gvfs-afp and hfsplus-tools was easy and got the T2 talking to the G4 file server running 10.4.11. I don't like the Linux font set much, so I'll be copying my font folder from the G5 over and converting things with Fondu as necessary. VLC will play CDs, but I will probably try to port my command-line player since it's easier for me to manipulate. I also need to move my Quake PAKs and Doom WADs over, because everyone needs a coffee break now and then, and finally get my Pixel XL to backup its photos to it. I also added even more Mac key combinations to AutoKey to maintain my Mac command-key muscle memory.

Anyway, after I've submitted this post I'll power down the Talos tonight and wake the G5 back up again tomorrow to continue work on TenFourFox FPR8, having slept peacefully and properly over the entire holiday weekend. Now that same-site cookies are working, it's time to get some sort of basic CSS grid support operational (or at least whitelisted for those sites that need it), and I still want to finish idle callback support and date-time picker support. After all, even though the T2 is getting closer and closer to being suitable as my main computer, there's still a lot I'll need to keep the G5 around for, so I'm certainly not planning to get rid of it. Or, you know, "put it to sleep" in the veterinary sense. Just because it's old doesn't mean it's useless.

Monday, May 21, 2018

Spectre Number 4, STEP RIGHT UP!

Updated based on IBM's documentation.

In the continuing saga of Meltdown and Spectre (tl;dr: G4/7400, G3 and likely earlier 60x PowerPCs don't seem vulnerable at all; G4/7450 and G5 are so far affected by Spectre while Meltdown has not been confirmed, but IBM documentation implies "big" POWER4 and up are vulnerable to both) is now Spectre variant 4. In this variant, the fundamental issue of getting the CPU to speculatively execute code it mistakenly predicts will be executed and observing the effects on cache timing is still present, but here the trick has to do with executing a downstream memory load operation speculatively before other store operations that the CPU (wrongly) believes the load does not depend on. The processor will faithfully revert the stores and the register load when the mispredict is discovered, but the loaded address will remain in the L1 cache and be observable through means similar to those in other Spectre-type attacks.

The G5, POWER4 and up are so aggressively out of order with memory accesses that they are almost certainly vulnerable. In an earlier version of this post, I didn't think the G3 and 7400 were vulnerable (as they don't appear to be to other Spectre variants), but after some poring over IBM's technical documentation I now believe with some careful coding it could be possible -- just not very probable. The details have to do with the G3 (and 7400)'s Load-Store Unit, or LSU, which is responsible for reading and writing memory. Unless a synchronizing instruction intervenes, up to one load instruction can execute ahead of a store, which makes the attack theoretically possible. However, the G3 and 7400 cannot reorder multiple stores in this fashion, and because only a maximum of two instructions may be dispatched to the LSU at any time (in practice less since those two instructions are spread across all of the processor's execution units), the victim load and the confounding store must be located immediately together or have no LSU-issued instructions between them. Even then, reliably ensuring that both instructions get dispatched in such a way that the CPU will reorder them in the (attacker-)desired order wouldn't be trivial.

The 7450, as with other Spectre variants, makes the attack a bit easier. It can dispatch up to four instructions to its execution units, which makes the attack more feasible because there is more theoretical flexibility on where the victim load can be located downstream (especially if all four instructions go to its LSU). However, it too can execute at most just one load instruction ahead of a store, and it cannot reorder stores either.

That said, as a practical matter, Spectre in any variant (including this one) is only a viable attack vector on Power Macs through native applications, which have far more effective methods of pwning your Power Mac at their disposal than an intermittently successful attempt to read memory. Although TenFourFox has a JavaScript JIT, no 7450 and probably not even the Quad is fast enough to obtain enough of a memory timing delta to make the attack functional (let alone reliable), and we disabled the high-resolution timers necessary for the exploit "way back" in FPR5 anyway. The new variant 4 is a bigger issue for Talos II owners like myself because such an attack is possible and feasible on the POWER9, but we can confidently expect that there will be patches from IBM and Raptor to address it soon.

Wednesday, May 16, 2018

A little Talos of your very own

I haven't had as much time to work on getting QEMU and Firefox functional/useable on the Talos II over the last few days because of work complications (I'll be reporting on that in a few weeks), but Raptor has heard those of you who are still suffering sticker PTSD from the Talos II and announced the Talos II Lite.

Yes, think of it as the Mac mini G4 to the Talos II's Quad G5. This comparison is not completely inappropriate because the T2L has only one CPU socket (the T2 has two) and thus only 24 PCIe lanes, split amongst an x16 and an x8 (the T2 fully loaded has two x8s and three x16s), and "only" 8 DDR4 slots (the T2 has 16). You can still cram one of the 22-core demons into one of those, though. Starting price is "just" $1399.99, though as with the Talos II the CPU is extra ($375 for 4-core to $2575 for 22-core), the RAM is extra ($255 for 16GB to $2950 for 128GB), and the storage is extra (Microsemi SAS starts at $300 plus drives, or a Samsung 960 EVO NVMe 500GB for $350, or a four-port SATA controller for $50 plus drives). You can also add the same Radeon WX 7100 workstation card that's in the big T2 ($800), too, or just use the same onboard VGA controller that comes with the T2 (built-in). It has USB 3.0 and dual Gig Ethernet, just like the big fella, though it doesn't seem to come with a BD-ROM.

However, the mini:Quad analogy falls down when you look at the actual size of the Lite. It, too, is an EATX behemoth, despite the leaner spec. Personally I would have hoped for something a little more manageably dimensioned. Raptor is taking about offering a smaller board but that would require a redesign and this was probably an artifact of getting it launched cheap(er)ly.

So would I have saved money with my T2 going Lite? Let's price it out: $1400 for the system (includes 500W PSU and EATX case), $595 for the octocore POWER9 (my T2 has two 4-core chips), $535 for 32GB ECC DDR4 RAM, $350 for the SAS card, $800 for the AMD Radeon WX 7100, $50 for the 4-port SATA card (this came installed "free" in my T2) and $350 for the 500GB Samsung NVMe SSD. Sticker price for that configuration is $4080 plus applicable tax and shipping; I repriced the same configuration for the Talos II and got a sticker cost of $7360, about $250 more than what I paid personally (the benefit of being an early adopter), so let's say a cost difference of $3300. That's substantial and a whole lot more palatable. $4080 is actually within Quad G5 range -- I paid not much less than that for my Quad G5 back in the day with the 7800GT and 8GB of RAM. A cheap SATA DVD-RW or something wouldn't add much more to the price if you want an optical drive.

There's a small problem here though: the Lite can't actually accommodate that loadout because there's not enough PCIe slots to get it all in there. In fact, I've got another 1GB NVMe drive to install in my T2, and I'm probably going to pull the now unused Sonnet FireWire/USB PCIe card (I prefer FireWire hubs) from the G5 to install in it too, which may mean temporarily pulling the SAS card until I'm ready to populate the front bays. Also, the Talos II out of the box doesn't support PCIe bifurcation, so I really do need both those slots for my SSDs. Per Raptor it can: with changes to the machine XML definition it could be made to "trifurcate" the x16 endpoint on slot 3 (CPU 2, PHB2) into an x8 and two x4, but that would mean that the available 4-way M.2 NVMe multicards would only have at most three slots available, and the system doesn't ship that way anyhow. Besides, even if you did get bifurcation working on the Lite, you'd only have the remaining x8 for anything else which couldn't be used for an x16 workstation video card. UPDATE: Per Raptor, the Lite's x16 can't be bifurcated due to a hardware limitation, so that is only an option for the big system.

But let's say you're not a maniac like me and you want a basic "budget" config. Let's drop the workstation card and the SAS card, and drop to a 4-core with 16GB, and we have a $2430 system. Wow! Not bad! You've still got the NVMe card and storage expansion over SATA, and you've still got USB ports for audio and the onboard VGA. But you've used up all your PCIe slots, so let's hope you don't need anything else to go internal (let alone 3D acceleration). If you really want that x16 slot back, drop the NVMe card and add some SATA drives ($2080 + devices), but now you're starting to strip this system down more than you might like to, and it doesn't get much cheaper that way.

Overall, that $3300 really does translate into greatly improved expandability in addition to the beefier power supplies, and thus it was never really an option for my needs personally. Maybe my mini:Quad analogy wasn't so off base. But if you want to join the POWER9 revolution on a budget and give Chipzilla the finger, as all right-thinking nerds should, you've now got an option that only requires passing a kidneystone of just half the size or less. It ships starting in July.

Another interesting thing Raptor pointed out: in the Phoronix performance tests, the Talos was running with full Spectre and Meltdown protections, but the x86 wasn't! Boooo! And if you really want to turn Spectre protections off on the Talos for even more grunty, you can do that. Meanwhile, as we speak, Intel is making people take down their firmware documentation and trying to stymie efforts to reverse engineer them. What system would you rather support?