Sunday, December 29, 2019

And now for something completely different: The dawning of the Age of Apple Aquarius

Updated with a few notes from Al Kossow, who was at Apple during this period

An interesting document has turned up at the Internet Archive: the specification to the Scorpius CPU, the originally intended RISC successor to the 68K Macintosh.

In 1986 the 68K processor line was still going strong but showing its age, and a contingent of Apple management (famously led by then-Mac division head Jean-Louis Gassée and engineer Sam Holland) successfully persuaded then-CEO John Sculley that Apple should be master of its own fate with its own CPU. RISC was just emerging at that time, with the original MIPS R2000 CPU appearing around 1985, and was clearly where the market was going (arguably it still is, since virtually all major desktop and mobile processors are load-store at the hardware level today, even Intel); thus was the Aquarius project born. Indeed, Sculley's faith in the initiative was so great that he allocated a staff of fifty and even authorized a $15 million Cray supercomputer, which was smoothed over with investors by claiming it was for modeling Apple hardware (which, in a roundabout and overly optimistic way, it was) and to see, in Al Kossow's words, "what could be done if you had a Macintosh with the power of a Cray."

Holland was placed in charge of the project and set about designing the CPU for Aquarius. The processor's proposed feature set was highly ambitious, including four cores and SIMD (vector) support with inter-processor communication features. Holland's specification was called Scorpius; the initial implementation of the Scorpius design was to be christened Antares. This initial specification is what was posted at the Internet Archive, dated around 1988.

Despite Sculley and Gassée's support, Aquarius was controversial at Apple from the very beginning: it required a substantial RandD investment, cash which Apple could ill afford to fritter away at the time, and even if the cash were there many within the company did not believe Apple had sufficient technical chops to get the CPU to silicon. Holland's complex specification worried senior management further as it required solving various technical problems that even large, highly experienced chip design companies at the time would have found difficult.

With only a proposal and no actual hardware by 1988, Sculley became impatient, and Holland was replaced by Al Alcorn. Alcorn was a legend in the industry by this time, best known for his work at Atari, where he designed Pong and was involved in the development of the Atari 400 and the ill-fated "holographic" Atari Cosmos. After leaving Atari in 1981, he consulted for various companies and was hired as an Apple Fellow in 1985. Alcorn was brought in to provide a new perspective on Aquarius and pitched the feasibility question to microprocessor expert Hugh Martin, who studied the specs and promptly pronounced them "ridiculous" to both Alcorn and Sculley. On this advice Sculley scuttled Aquarius in 1989 and hired Martin to design a computer instead using an existing CPU. Martin's assignment became the similarly ill-fated Jaguar project, which completed poorly with another simultaneous project led by veteran engineer Jack McHenry called Cognac. Cognac, unlike Jaguar and Aquarius, actually produced working hardware. The "RISC LC" that the Cognac team built, originally a heavily modified Macintosh LC with a Motorola 88100 CPU running Mac OS, became the direct ancestor of the Power Macintosh. The Cray supercomputer, now idle at Valley Green 3, only ever used its then-innovative framebuffer to generate the credits for the famous Pencil Test animation. It ultimately went to the industrial design group for case modeling until it was dismantled.

Now that we have an actual specification to read, how might this have compared to the PowerPC 601? Scorpius defined a big-endian 32-bit RISC chip addressing up to 4GB of RAM with four cores, which the technical specification refers to as processing units, or PUs. Each core shares instruction and data caches with the others and communicates over a 5x4 crossbar network, and because all cores on a CPU must execute within the same address space, are probably best considered most similar to modern hardware threads (such as the 32 threads on the SMT-4 eight core POWER9 I'm typing this on). An individual core has 16 32-bit general purpose registers (GPRs) and seven special purpose registers (SPRs), plus eight global SPRs common to the entire CPU, though there is no floating-point unit in the specification we see here. Like ARM, and unlike PowerPC and modern Power ISA, the link register (which saves return addresses) is a regular GPR and code can jump directly to an address in any register. However, despite having a 32-bit addressing space and 32-bit registers, Scorpius uses a fixed-size 16-bit instruction word. Typical of early RISC designs and still maintained in modern MIPS CPUs, it also has a branch delay slot, where the instruction following a branch (even if the branch is taken) is always executed. Besides the standard cache control instructions, there are also special instructions for a core to broadcast to other cores, and the four PUs could be directed to work on data in tandem to yield SIMD vector-like operations (such as what you would see with AltiVec and SSE). Holland's design even envisioned an "inter-processor bus" (IPB) connecting up to 16 CPUs, each with their own local memory, something not unlike what we would call a non-uniform memory access (NUMA) design today.

The 16-bit instruction size greatly limits the breadth of available instructions compared to PowerPC's 32-bit instructions, but that would certainly be within the "letter" spirit of RISC. It also makes the code possibly more dense than PowerPC, though the limited amount of bits available for displacements and immediate values requires the use of a second prefix register and potentially multiple instructions which dampens this advantage somewhat. The use of multiple PUs in tandem for SIMD-like operations is analogous to AltiVec and rather more flexible, though the use of bespoke hardware support in later SIMD designs like the G4 is probably higher performance. The lack of a floating-point unit was probably not a major issue in 1986 but wasn't very forward-looking as every 601 shipped with an FPU standard from the factory; on the other hand, the NUMA IPB was very adventurous and certainly more advanced than multiprocessor PowerPC designs, something that wasn't even really possible until the 604 (or not without a lot of hacks, as in the case of the 603-based BeBox).

It's ultimately an academic exercise, of course, because this specification was effectively just a wish list whereas the 601 actually existed, though not for several more years. Plus, the first Power Macs, being descendants of the compatibility-oriented RISC LC, could still run 68K Mac software; while the specification doesn't say, Aquarius' radical differences from its ancestor suggests a completely isolated architecture intended for a totally new computer. Were Antares-based systems to actually emerge, it is quite possible that they would have eclipsed the Mac as a new and different machine, and in that alternate future I'd probably be writing a droll and informative article about the lost RISC Mac prototype instead.

Monday, December 23, 2019

TenFourFox FPR18b1 available

TenFourFox Feature Parity Release 18 beta 1 is now available (downloads, hashes, release notes). As promised, the biggest change in this release is to TenFourFox's Reader mode. Reader mode uses Mozilla Readability to display a stripped-down version of the page with (hopefully) the salient content, just the salient content, and no crap or cruft. This has obvious advantages for old systems like ours because Reader mode pages are smaller and substantially simpler, don't run JavaScript, and help to wallpaper over various DOM and layout deficiencies our older patched-up Firefox 45 underpinnings are starting to show a bit more.

In FPR18, Reader mode has two main changes: first, it is updated to the same release used in current versions of Firefox (I rewrote the glue module in TenFourFox so that current releases could be used unmodified, which helps maintainability), and second, Reader mode is now allowed on most web pages instead of only on ones Readability thinks it can render. By avoiding a page scan this makes the browser a teensy bit faster, but it also means that edge-case web pages that could still usefully display in Reader mode now can do so. When Reader mode can be enabled, a little "open book" icon appears in the address bar. Click that and it will turn orange and the page will switch to Reader mode. Click it again to return to the prior version of the page. Certain sites don't work well with this approach and are automatically filtered; we use the same list as Firefox. If you want the old method where the browser would scan the page first before offering reader mode, switch tenfourfox.reader.force-enable to false and reload the tab, and please mention what it was doing inappropriately so it can be investigated.

Reader mode isn't seamless, and in fairness wasn't designed to be. The most noticeable discontinuity is if you click a link within a Reader mode page, it renders that link in the regular browser (requiring you to re-enter Reader mode if you want to stay there), which kind of sucks for multipage documents. I'm considering a tweak to it such that you stay in Reader mode in a tab until you exit it but I don't know how well this would work and it would certainly alter the functionality of many pages. Post your thoughts in the comments. I might consider something like this for FPR19.

Besides the usual security updates, FPR18 also makes a minor compatibility fix to the browser and improves the comprehensiveness of removing browser data for privacy reasons. More work needs to be done on this because of currently missing APIs, but this first pass in FPR18 is a safe and easy improvement. As this is the first "fast four week" release, it will become live January 6.

I also wrote up another quickie script for those of you exploring TenFourFox's AppleScript support. Although Old Reddit appears to work just dandy with TenFourFox, the current React-based New Reddit is a basket case: it's slow, it uses newer JavaScript support that TenFourFox only allows incompletely, and its DOM is hard for extensions to navigate. If you're stuck on New Reddit and you can't read the comments because the "VIEW ENTIRE CONVERSATION" button doesn't work because React and if you work on React I hate you, you can now download Reddit Moar Comments. When the script is run, if a Reddit comments thread is in the current tab, it acts as if the View Entire Conversation button had been clicked and expands the thread. If you're like me, put the Scripts menu in the menu bar (using the AppleScript Utility), have a TenFourFox folder in your Scripts, and put this script in it so it's just a menu drop-down or two away. Don't forget to try the other possibly useful scripts in that folder, or see if you can write your own.

Merry Christmas to those of you who celebrate it, and a happy holiday to all.

Saturday, December 21, 2019

RIP, Chuck Peddle

I never got the pleasure to have met him in person, but virtually any desktop computer owes a debt to him. Not only the computers using the the 6502 microprocessor he designed, but because the 6502 was so inexpensive (especially compared against the Intel and Motorola chips it competed with) that it made the possibility of a computer in everybody's home actually feasible. Here just in the very room I'm typing this, there is a Commodore 128D, several Commodore SX-64s (with the 8502 and 6510 respectively, variants of the 6502 with on-chip I/O ports), a Commodore KIM-1, a blue-label PET 2001, an Apple IIgs (technically with a 65816, the later WDC 16-bit variant), an Atari 2600 (6507, with a reduced address bus), an Atari Lynx (with the CMOS WDC WD65SC02), and an NEC TurboExpress (Hudson HuC6280, another modified WDC 65C02, with a primitive MMU). The 6502 appeared in fact in the Nintendo Famicom/NES (Ricoh 2A03 variant) and Super Nintendo (65816) and the vast majority of Commodore home computers before the Amiga, plus the Atari 8-bit and Apple II lines. For that matter, the Commodore 1541s and 1571s separate and built-into the 128D and SX-64s have 6502 CPUs too. Most impactful was probably its appearance in the BBC Micro series which was one of the influences on the now-ubiquitous ARM architecture.

I will not recapitulate his life or biography except to say that when I saw him a number of years ago in a Skype appearance at Vintage Computer Festival East (in a big cowboy hat) he was a humble, knowledgeable and brilliant man. Computing has lost one of its most enduring pioneers, and I think it can be said without exaggeration that the personal computing era probably would not have happened without him.

Wednesday, December 18, 2019

And now for something completely different: Way more FPS than Counterstrike on PowerPC DOSBox

If you're unfamiliar with it, DOSBox is an x86 emulator specializing in running PC DOS games (and, presumably, any PC DOS application). For many DOS-based titles, in fact, it may be the only way to run them on modern PCs, let alone Macs. Besides emulating old hardware like video cards and SoundBlasters sufficiently for games to run, one of its key features on supported platforms is dynamically recompiling x86 machine language for enhanced performance. Unfortunately, DOSBox on Power Macs, while it is still supported as of this writing (10.4 and up), runs x86 code strictly in an interpreter and as a result can be quite slow on low-spec systems. For certain games requiring 80486-level performance, only the G5 can realistically emulate those at any reasonable level, and even then uses a lot of CPU doing so or requires skipping some portion of frames to make the game at all playable.

Over on Vogons earlier this year one of the DOSBox contributors wrote up a PowerPC JIT for the dynrec core under Linux which was then ported to OS X. Like the TenFourFox JIT compiles JavaScript to PowerPC machine code, this patch compiles x86 machine code to PowerPC machine code and runs that instead of requiring labourious interpretation. There was briefly a build available on Dropbox with this JIT, which is currently not part of the DOSBox source tree, but that build can no longer be downloaded. So, I went ahead and reconstructed it against the current trunk (at the time I pulled it, r4301) along with a minor tweak and another of this developer's big-endian fixes, and have tidied it up for download.

To give you an idea of the improvement, I have a few DOS games I enjoy that were never ported to Mac (along with LucasArts' Dark Forces, shown here, which has an outstanding Mac port with up to double the resolution of the DOS version but only runs in Classic). On this Quad G5 set to Reduced performance, with interpreted mainline DOSBox I usually had to have it skip every other frame. This made games like Commander Keen in Goodbye, Galaxy!, Pinball Illusions and Extreme Pinball playable, but only just so, and also made the animation sadly a bit jerky. Even with this handicap some games were still unplayable unless I turned the machine to Maximum performance (and cooked the room), and even then Dark Forces and Death Rally usually ranged from stuttery to slideshow. With the JIT enabled, not only could I get rid of the frame skip entirely -- even in Reduced performance -- but all of the games became absolutely playable. In Maximum performance they run better than my real 486 DOS games box.

Explaining how DOSBox works is beyond the scope of this blog post, but fortunately there is excellent documentation. Once you've played with the official build, you can get a JIT-enabled build (there's an "all Power Macs" and a G5-specific version with different compiler settings) from the repository I set up at SourceForge. This uses a separate preferences file which is included with some default settings. Out of the box frameskipping is disabled, which is appropriate for pretty much any G5 system from 2GHz on up with most titles. If you're using a G4 or slower G5 system, edit this file and change frameskip=0 to frameskip=1 or frameskip=2 (my 1.25GHz iMac G4 needed frameskip=2 as a minimum, but an MDD or 7447 might be okay with a frameskip of 1). You should use OpenGL as your output if at all possible for maximal hardware assistance (which is the default in this file); rendering to a surface in software is rather slower. Copy it to ~/Library/Preferences and start playing your classic DOS games with Power.

No warranty for this build is expressed or implied and this may be the only build I ever offer. There are known bugs in this build which Mac builds from the SVN trunk have too and thus I have not attempted to fix them. The patch is included so you can roll your own; these preconfigured builds are merely offered as a convenience and to make more fun games practical on your already fun Power Mac. Eventually I'll rewrite this for ppc64le so my Raptor Talos II can play with Power too.

Sunday, December 1, 2019

TenFourFox FPR17 available

TenFourFox Feature Parity Release 17 final is now available for testing (downloads, hashes, release notes). Apologies for the delay, but I was visiting family and didn't return until a few hours ago so I could validate and perform the confidence testing on the builds. There are no other changes in this release other than a minor tweak to the ATSUI font blacklist and outstanding security patches. Assuming all is well, it will go live tomorrow evening Pacific time.

The FPR18 cycle is the first of the 4-week Mozilla development cycles. It isn't feasible for me to run multiple branches, so we'll see how much time this actually gives me for new work. As previously mentioned, FPR18 will be primarily about parity updates to Reader mode, which helps to shore up the browser's layout deficiencies and is faster to render as well. There will also be some other minor miscellaneous fixes.