Sunday, December 29, 2019

And now for something completely different: The dawning of the Age of Apple Aquarius

Updated with a few notes from Al Kossow, who was at Apple during this period

An interesting document has turned up at the Internet Archive: the specification to the Scorpius CPU, the originally intended RISC successor to the 68K Macintosh.

In 1986 the 68K processor line was still going strong but showing its age, and a contingent of Apple management (famously led by then-Mac division head Jean-Louis Gassée and engineer Sam Holland) successfully persuaded then-CEO John Sculley that Apple should be master of its own fate with its own CPU. RISC was just emerging at that time, with the original MIPS R2000 CPU appearing around 1985, and was clearly where the market was going (arguably it still is, since virtually all major desktop and mobile processors are load-store at the hardware level today, even Intel); thus was the Aquarius project born. Indeed, Sculley's faith in the initiative was so great that he allocated a staff of fifty and even authorized a $15 million Cray supercomputer, which was smoothed over with investors by claiming it was for modeling Apple hardware (which, in a roundabout and overly optimistic way, it was) and to see, in Al Kossow's words, "what could be done if you had a Macintosh with the power of a Cray."

Holland was placed in charge of the project and set about designing the CPU for Aquarius. The processor's proposed feature set was highly ambitious, including four cores and SIMD (vector) support with inter-processor communication features. Holland's specification was called Scorpius; the initial implementation of the Scorpius design was to be christened Antares. This initial specification is what was posted at the Internet Archive, dated around 1988.

Despite Sculley and Gassée's support, Aquarius was controversial at Apple from the very beginning: it required a substantial RandD investment, cash which Apple could ill afford to fritter away at the time, and even if the cash were there many within the company did not believe Apple had sufficient technical chops to get the CPU to silicon. Holland's complex specification worried senior management further as it required solving various technical problems that even large, highly experienced chip design companies at the time would have found difficult.

With only a proposal and no actual hardware by 1988, Sculley became impatient, and Holland was replaced by Al Alcorn. Alcorn was a legend in the industry by this time, best known for his work at Atari, where he designed Pong and was involved in the development of the Atari 400 and the ill-fated "holographic" Atari Cosmos. After leaving Atari in 1981, he consulted for various companies and was hired as an Apple Fellow in 1985. Alcorn was brought in to provide a new perspective on Aquarius and pitched the feasibility question to microprocessor expert Hugh Martin, who studied the specs and promptly pronounced them "ridiculous" to both Alcorn and Sculley. On this advice Sculley scuttled Aquarius in 1989 and hired Martin to design a computer instead using an existing CPU. Martin's assignment became the similarly ill-fated Jaguar project, which completed poorly with another simultaneous project led by veteran engineer Jack McHenry called Cognac. Cognac, unlike Jaguar and Aquarius, actually produced working hardware. The "RISC LC" that the Cognac team built, originally a heavily modified Macintosh LC with a Motorola 88100 CPU running Mac OS, became the direct ancestor of the Power Macintosh. The Cray supercomputer, now idle at Valley Green 3, only ever used its then-innovative framebuffer to generate the credits for the famous Pencil Test animation. It ultimately went to the industrial design group for case modeling until it was dismantled.

Now that we have an actual specification to read, how might this have compared to the PowerPC 601? Scorpius defined a big-endian 32-bit RISC chip addressing up to 4GB of RAM with four cores, which the technical specification refers to as processing units, or PUs. Each core shares instruction and data caches with the others and communicates over a 5x4 crossbar network, and because all cores on a CPU must execute within the same address space, are probably best considered most similar to modern hardware threads (such as the 32 threads on the SMT-4 eight core POWER9 I'm typing this on). An individual core has 16 32-bit general purpose registers (GPRs) and seven special purpose registers (SPRs), plus eight global SPRs common to the entire CPU, though there is no floating-point unit in the specification we see here. Like ARM, and unlike PowerPC and modern Power ISA, the link register (which saves return addresses) is a regular GPR and code can jump directly to an address in any register. However, despite having a 32-bit addressing space and 32-bit registers, Scorpius uses a fixed-size 16-bit instruction word. Typical of early RISC designs and still maintained in modern MIPS CPUs, it also has a branch delay slot, where the instruction following a branch (even if the branch is taken) is always executed. Besides the standard cache control instructions, there are also special instructions for a core to broadcast to other cores, and the four PUs could be directed to work on data in tandem to yield SIMD vector-like operations (such as what you would see with AltiVec and SSE). Holland's design even envisioned an "inter-processor bus" (IPB) connecting up to 16 CPUs, each with their own local memory, something not unlike what we would call a non-uniform memory access (NUMA) design today.

The 16-bit instruction size greatly limits the breadth of available instructions compared to PowerPC's 32-bit instructions, but that would certainly be within the "letter" spirit of RISC. It also makes the code possibly more dense than PowerPC, though the limited amount of bits available for displacements and immediate values requires the use of a second prefix register and potentially multiple instructions which dampens this advantage somewhat. The use of multiple PUs in tandem for SIMD-like operations is analogous to AltiVec and rather more flexible, though the use of bespoke hardware support in later SIMD designs like the G4 is probably higher performance. The lack of a floating-point unit was probably not a major issue in 1986 but wasn't very forward-looking as every 601 shipped with an FPU standard from the factory; on the other hand, the NUMA IPB was very adventurous and certainly more advanced than multiprocessor PowerPC designs, something that wasn't even really possible until the 604 (or not without a lot of hacks, as in the case of the 603-based BeBox).

It's ultimately an academic exercise, of course, because this specification was effectively just a wish list whereas the 601 actually existed, though not for several more years. Plus, the first Power Macs, being descendants of the compatibility-oriented RISC LC, could still run 68K Mac software; while the specification doesn't say, Aquarius' radical differences from its ancestor suggests a completely isolated architecture intended for a totally new computer. Were Antares-based systems to actually emerge, it is quite possible that they would have eclipsed the Mac as a new and different machine, and in that alternate future I'd probably be writing a droll and informative article about the lost RISC Mac prototype instead.

4 comments:

  1. This sounds a lot like an old Soviet MultiClet parallel computer used in Satellites that would throw several parallel processors at a task, but as the chips failed due to radiation, the others continued on, but proportionately slower. Pretty interesting as it was designed to run directly from higher-level coding, and also has some pretty amazing power-usage advantages. There used to be a lot of info about it, but recently it turned-up as a very cost-effective way to mine Bitcoin, and now it is next to IMPOSSIBLE to find any info online bout it!

    ReplyDelete
  2. It sounds both very interesting as well as quite complicated, though it seems that it would have been ahead of its time.

    ReplyDelete
  3. What isn’t mentioned is in addition to a shared cache and multiprocessor coordination instructions, the processor was also had a limited superscalar ability. Prefix ops could be combined with a following instruction to look like a single 32bit op, which got around the limited offsets and performance penalty of short immediates.
    Ditto for compares and branches. The result was small, compact code while avoiding performance penalties and complex decoders. But Apple had nowhere near the follow that would permit a custom processor to be profitable- then. And, in an odd twisted bit of history, it was indirectly responsible for the founding of ARM.

    ReplyDelete
  4. The cray was used by my group for modeling video compression algorithms. Eventually led to vector quantization based decoder for QuickTime 1.0, and tree structured encoder used in the virtual museum. Lots of the tricks learned here were used in subsequent generations of software video codecs.

    ReplyDelete

Due to an increased frequency of spam, comments are now subject to moderation.