Tip of the hat to
miniupnp who
ported the Spectre proof of concept to PowerPC intrinsics. I ported it to 10.2.8 so I could get a G3 test result, and then built generic PowerPC, G3, 7400, 7450 and G5 versions at
-O0,
-O1,
-O2 and
-O3 for a grand total of 20 variations.
Recall from our most recent foray into the Spectre attack that I believed the G3 and 7400 would be hard to successfully exploit because of their unusual limitations on speculative execution through indirect branches. Also, remember that this PoC assumes the most favourable conditions possible: that it already knows exactly what memory range it's looking for, that the memory range it's looking for is in the same process and there is no other privilege or partition protection, that it can run and access system registers at full speed (i.e., is native), and that we're going to let it run to completion.
miniupnp's implementation uses the mftb(u) instructions, so if you're porting this to the 601, you weirdo, you'll need to use the equivalent on that architecture. I used Xcode 2.5 and gcc 4.0.1.
Let's start with, shall we say, a positive control. I felt strongly the G5 would be vulnerable, so here's what I got on my Quad G5 (DC/DP 2.5GHz PowerPC 970MP) under 10.4.11 with Energy Saver set to Reduced Performance:
- -arch ppc -O0: partial failure (two bytes wrong, but claims all "success")
- -arch ppc -O1: recovers all bytes (but claims all "unclear")
- -arch ppc -O2: same
- -arch ppc -O3: same
- -arch ppc750 -O0: partial failure (twenty-two bytes wrong, but claims all "unclear")
- -arch ppc750 -O1: recovers all bytes (but claims all "unclear")
- -arch ppc750 -O2: almost complete failure (twenty-five bytes wrong, but claims all "unclear")
- -arch ppc750 -O3: almost complete failure (twenty-six bytes wrong, but claims all "unclear")
- -arch ppc7400 -O0: almost complete failure (twenty-eight bytes wrong, claims all "success")
- -arch ppc7400 -O1: recovers all bytes (but claims all "unclear")
- -arch ppc7400 -O2: almost complete failure (twenty-six bytes wrong, but claims all "unclear")
- -arch ppc7400 -O3: almost complete failure (twenty-eight bytes wrong, but claims all "unclear")
- -arch ppc7450 -O0: recovers all bytes (claims all "success")
- -arch ppc7450 -O1: recovers all bytes (but claims all "unclear")
- -arch ppc7450 -O2: same
- -arch ppc7450 -O3: same
- -arch ppc970 -O0: recovers all bytes (claims all "success")
- -arch ppc970 -O1: recovers all bytes, but noticeably more slowly (and claims all "unclear")
- -arch ppc970 -O2: partial failure (one byte wrong, but claims all "unclear")
- -arch ppc970 -O3: recovers all bytes (but claims all "unclear")
Twiddling
CACHE_HIT_THRESHOLD to any value other than 1 caused the test to fail completely, even on the working scenarios.
These results are frankly all over the map and only two scenarios fully work, but they do demonstrate that the G5 can be exploited by Spectre. That said, however, the interesting thing is how timing-dependent the G5 is, not only to whether the algorithm succeeds but also to whether the algorithm believes it succeeded. The optimized G5 versions have more trouble recognizing if they worked even though they do; the fastest and most accurate is actually -arch ppc970 -O0. I mentioned the CPU speed for a reason, too, because if I set the system to Highest Performance, I get some noteworthy changes:
- -arch ppc -O0: recovers all bytes (claims all "success")
- -arch ppc -O1: partial failure (eight bytes wrong, claims all "unclear")
- -arch ppc -O2: partial failure (twenty bytes wrong, claims all "unclear")
- -arch ppc -O3: partial failure (twenty-three bytes wrong, claims all "unclear")
- -arch ppc750 -O0: almost complete failure (one byte recovered, but claims all "unclear")
- -arch ppc750 -O1: partial failure (five bytes wrong, claims all "unclear")
- -arch ppc750 -O2: complete failure (no bytes recovered, all "unclear")
- -arch ppc750 -O3: almost complete failure (thirty bytes wrong, but claims all "unclear")
- -arch ppc7400 -O0: recovers all bytes (claims all "success")
- -arch ppc7400 -O1: partial failure (four bytes wrong, but claims all "unclear")
- -arch ppc7400 -O2: complete failure (no bytes recovered, all "unclear")
- -arch ppc7400 -O3: same
- -arch ppc7450 -O0: recovers all bytes (claims all "success")
- -arch ppc7450 -O1: partial failure (eight bytes wrong, but claims all "unclear")
- -arch ppc7450 -O2: partial failure (seven bytes wrong, but claims all "unclear")
- -arch ppc7450 -O3: partial failure (five bytes wrong, but claims all "unclear")
- -arch ppc970 -O0: recovers all bytes (but three were "unclear")
- -arch ppc970 -O1: recovers all bytes, but noticeably more slowly (and claims all "unclear")
- -arch ppc970 -O2: partial failure (nineteen bytes wrong, claims all "unclear")
- -arch ppc970 -O3: partial failure (eighteen bytes wrong, claims all "unclear")
The speed increase causes one more scenario to succeed, but which ones do differ and it even more badly tanks some of the previously marginal ones. Again, twiddling
CACHE_HIT_THRESHOLD to any value other than 1 caused the test to fail completely, even on the working scenarios.
What about more recent Power ISA designs? Interestingly, my AIX Power 520 server configured as an SMT-2 two-core four-way POWER6 could not be exploited if CACHE_HIT_THRESHOLD was 1. If it was set to 80 as the default exploit has, however, on POWER6 the exploit recovers all bytes successfully (compiled with -O3 -mcpu=power6). IBM has not yet said as of this writing whether they will issue patches for the POWER6.
I should also note that the worst case on the G5 took nearly seven seconds to complete at reduced power (-arch ppc7400 -O0), though the best case took less than a tenth of a second (-arch ppc970 -O0). The POWER6 took roughly three seconds. These are not fast attacks for the limited number of bytes scanned.
Given that we know the test will work on a vulnerable PowerPC system, what about the ones we theorized were resistant? Why, I have two of them right here! Let's cut to the chase, friends, your humble author's suspicions appear to be correct. Neither my strawberry iMac G3 with Sonnet HARMONi CPU upgrade (600MHz PowerPC 750CX) running 10.2.8, nor my Sawtooth G4 file server (450MHz PowerPC 7400) running 10.4.11 can be exploited with any of ppc, ppc750 or ppc7400 at any optimization level. They all fail to recover any byte despite the exploit believing it worked, so I conclude the G3 and 7400 are not vulnerable to the proof of concept.
The attacks are also quite slow on these systems. To run on the lower clock speed Sawtooth took almost 5 seconds in realtime, even at -arch ppc7400 -O3 (seven seconds in the worst case), and pegged the processor during the test. Neither system has power management and ran at full speed.
That leaves the 7450 G4e, which as you'll recall has notable microarchitectural advances from the 7400 G4 and differences in its ability to speculatively execute indirect branches. What about that? Again, some highly timing-dependent results. First, let's look at my beloved 1GHz iMac G4 (1GHz PowerPC 7450), running 10.4.11:
- -arch ppc -O0: almost complete failure (twenty-nine bytes wrong, claims all "success")
- -arch ppc -O1: recovers all bytes (claims all "success")
- -arch ppc -O2: same
- -arch ppc -O3: partial failure (one byte wrong, but still claims all "success")
- -arch ppc750 -O0: recovers all bytes (claims all "success")
- -arch ppc750 -O1: recovers all bytes (claims all "success")
- -arch ppc750 -O2: recovers all bytes (claims all "success")
- -arch ppc750 -O3: partial failure (one byte wrong, correctly identified as "unclear")
- -arch ppc7400 -O0: almost complete failure (twenty-nine bytes wrong, claims all "success")
- -arch ppc7400 -O1: partial failure (one byte wrong, but still claims all "success")
- -arch ppc7400 -O2: same
- -arch ppc7400 -O3: partial failure (one byte wrong, correctly identified as "unclear")
- -arch ppc7450 -O0: almost complete failure (twenty-nine bytes wrong, claims all "success")
- -arch ppc7450 -O1: partial failure (one byte wrong, but still claims all "success")
- -arch ppc7450 -O2: recovers all bytes (claims all "success")
- -arch ppc7450 -O3: partial failure (one byte wrong, correctly identified as "unclear")
This is also all over the place, but quite clearly demonstrates
the 7450 is vulnerable and actually succeeds
more easily than the 970MP did. (This iMac G4 does not have power management.) Still, maybe we can figure out under which circumstances it is, so what about laptops? Let's get out my faithful 12" 1.33GHz iBook G4 (PowerPC 7447A), running 10.4.11 also. First, on reduced performance:
- -arch ppc -O0: recovers all bytes (claims all "success")
- -arch ppc -O1: recovers all bytes (claims all "success")
- -arch ppc -O2: recovers all bytes (claims all "success")
- -arch ppc -O3: partial failure (two bytes wrong, only one correctly identified as "unclear")
- -arch ppc750 -O0: partial failure (one byte wrong, correctly identified as "unclear")
- -arch ppc750 -O1: partial failure (one byte wrong, but still claims all "success")
- -arch ppc750 -O2: same
- -arch ppc750 -O3: recovers all bytes (claims all "success")
- -arch ppc7400 -O0: partial failure (one byte wrong, but still claims all "success")
- -arch ppc7400 -O1: recovers all bytes (claims all "success")
- -arch ppc7400 -O2: partial failure (two bytes wrong, only one correctly identified as "unclear")
- -arch ppc7400 -O3: recovers all bytes (claims all "success")
- -arch ppc7450 -O0: recovers all bytes (claims all "success")
- -arch ppc7450 -O1: partial failure (one byte wrong, but still claims all "success")
- -arch ppc7450 -O2: recovers all bytes (claims all "success")
- -arch ppc7450 -O3: recovers all bytes (claims all "success")
This succeeds a lot more easily, and the attack is much faster (less than a quarter of a second in the worst case). On highest performance:
- -arch ppc -O0: recovers all bytes (claims all "success")
- -arch ppc -O1: recovers all bytes (but one byte is "unclear")
- -arch ppc -O2: recovers all bytes (but one byte is "unclear")
- -arch ppc -O3: recovers all bytes (claims all "success")
- -arch ppc750 -O0: partial failure (one byte wrong, correctly identified as "unclear")
- -arch ppc750 -O1: recovers all bytes (claims all "success")
- -arch ppc750 -O2: partial failure (one byte wrong, correctly identified as "unclear")
- -arch ppc750 -O3: recovers all bytes (claims all "success")
- -arch ppc7400 -O0: recovers all bytes (claims all "success")
- -arch ppc7400 -O1: recovers all bytes (claims all "success")
- -arch ppc7400 -O2: recovers all bytes (claims all "success")
- -arch ppc7400 -O3: partial failure (one byte wrong, correctly identified as "unclear")
- -arch ppc7450 -O0: recovers all bytes (claims all "success")
- -arch ppc7450 -O1: recovers all bytes (claims all "success")
- -arch ppc7450 -O2: recovers all bytes (but one byte is "unclear")
- -arch ppc7450 -O3: partial failure (one byte wrong, correctly identified as "unclear")
This almost
completely succeeds! Even the scenarios that are wrong are still mostly correct; these varied a bit from run to run and some would succeed now and then too. The worst case timing is an alarming eighth of a second.
What gets weird is the DLSD PowerBook G4, though. Let's get out the last and mightiest of the PowerBooks with its luxurious keyboard, bright 17" high-resolution LCD and 1.67GHz PowerPC 7447B CPU running 10.5.8. The DLSD PowerBooks are notable for not allowing selectable power management ("Normal" or automatic equivalent only), and it turns out this is relevant here too:
- -arch ppc -O0: complete failure (no bytes recovered but some garbage, all "unclear")
- -arch ppc -O1: complete failure (no bytes recovered but mostly garbage, all "unclear")
- -arch ppc -O2: complete failure (no bytes recovered but some garbage, all "unclear")
- -arch ppc -O3: complete failure (no bytes recovered but mostly garbage, all "unclear")
- -arch ppc750 -O0: complete failure (no bytes recovered but half garbage, all "unclear")
- -arch ppc750 -O1: complete failure (no bytes recovered but some garbage, all "unclear")
- -arch ppc750 -O2: same
- -arch ppc750 -O3: same
- -arch ppc7400 -O0: almost complete failure (only one byte recovered, but all "unclear")
- -arch ppc7400 -O1: complete failure (no bytes recovered, all "unclear")
- -arch ppc7400 -O2: complete failure (no bytes recovered but all seen as "E", all "unclear")
- -arch ppc7400 -O3: complete failure (no bytes recovered but some garbage, all "unclear")
- -arch ppc7450 -O0: complete failure (no bytes recovered, all "unclear")
- -arch ppc7450 -O1: complete failure (no bytes recovered but half garbage, all "unclear")
- -arch ppc7450 -O2: same
- -arch ppc7450 -O3: same
This is an upgraded stepping of the same basic CPU, but the attack almost completely
failed. It failed in an unusual way, though: instead of using the question mark placeholder it usually uses for an indeterminate value, it actually puts in some apparently recovered nonsense bytes. These bytes are almost always garbage, though one did sneak in in the right place, which leads me to speculate that the 7447B is vulnerable too but something is mitigating it.
This DLSD is different from my other systems in two ways: it's got a slightly different CPU with known different power management, and it's running Leopard. Setting the iBook G4 to use automatic ("Normal") power management made little difference, however, so I got down two 12" PowerBook G4s with one running 10.4 with a 1.33GHz CPU and the other 10.5.8 with a 1.5GHz CPU. The 10.4 12" PowerBook G4 was almost identical to the 10.4 12" in terms of vulnerability, but it got interesting in on the 10.5.8 system. In order, low, automatic and highest performance:
- -arch ppc -O0: recovers all bytes (claims all "success")
- -arch ppc -O1: partial failure (four bytes wrong, but still claims all "success")
- -arch ppc -O2: partial failure (five bytes wrong, but still claims all "success")
- -arch ppc -O3: partial failure (four bytes wrong, but still claims all "success")
- -arch ppc750 -O0: partial failure (two bytes wrong, but still claims all "success")
- -arch ppc750 -O1: partial failure (two bytes wrong, both garbage, but still claims all "success")
- -arch ppc750 -O2: partial failure (one byte wrong, correctly identified as "unclear")
- -arch ppc750 -O3: partial failure (four bytes wrong, but still claims all "success")
- -arch ppc7400 -O0: recovers all bytes (claims all "success")
- -arch ppc7400 -O1: partial failure (one byte wrong, but still claims all "success")
- -arch ppc7400 -O2: recovers all bytes (claims all "success")
- -arch ppc7400 -O3: partial failure (two bytes wrong, but still claims all "success")
- -arch ppc7450 -O0: recovers all bytes (claims all "success")
- -arch ppc7450 -O1: recovers all bytes (claims all "success")
- -arch ppc7450 -O2: recovers all bytes (claims all "success")
- -arch ppc7450 -O3: partial failure (four bytes wrong, but still claims all "success")
- -arch ppc -O0: recovers all bytes (claims all "success")
- -arch ppc -O1: partial failure (thirteen bytes wrong, all "T", correctly identified as "unclear")
- -arch ppc -O2: partial failure (nine bytes wrong, some "u", correctly identified as "unclear")
- -arch ppc -O3: partial failure (eight bytes wrong, correctly identified as "unclear")
- -arch ppc750 -O0: partial failure (thirteen bytes wrong, all "-", correctly identified as "unclear")
- -arch ppc750 -O1: partial failure (fifteen bytes wrong, correctly identified as "unclear")
- -arch ppc750 -O2: partial failure (fifteen bytes wrong, some "@", correctly identified as "unclear")
- -arch ppc750 -O3: partial failure (sixteen bytes wrong, correctly identified as "unclear")
- -arch ppc7400 -O0: recovers all bytes (claims all "success")
- -arch ppc7400 -O1: partial failure (seven bytes wrong, correctly identified as "unclear")
- -arch ppc7400 -O2: partial failure (eleven bytes wrong with three garbage bytes, correctly identified as "unclear")
- -arch ppc7400 -O3: partial failure (eleven bytes wrong, all garbage, correctly identified as "unclear")
- -arch ppc7450 -O0: recovers all bytes (claims all "success")
- -arch ppc7450 -O1: partial failure (ten bytes wrong, correctly identified as "unclear")
- -arch ppc7450 -O2: partial failure (seventeen bytes wrong, all "h", correctly identified as "unclear")
- -arch ppc7450 -O3: partial failure (twelve bytes wrong, all "b", correctly identified as "unclear")
- -arch ppc -O0: recovers all bytes (claims all "success")
- -arch ppc -O1: partial failure (three bytes wrong with two garbage bytes, correctly identified as "unclear")
- -arch ppc -O2: partial failure (eight bytes wrong, all various garbage bytes, correctly identified as "unclear")
- -arch ppc -O3: partial failure (six bytes wrong, correctly identified as "unclear")
- -arch ppc750 -O0: partial failure (four bytes wrong, all various garbage bytes, correctly identified as "unclear")
- -arch ppc750 -O1: partial failure (four bytes wrong, correctly identified as "unclear")
- -arch ppc750 -O2: partial failure (eleven bytes wrong, correctly identified as "unclear")
- -arch ppc750 -O3: partial failure (four bytes wrong, all various garbage bytes, correctly identified as "unclear")
- -arch ppc7400 -O0: recovers all bytes (claims all "success")
- -arch ppc7400 -O1: partial failure (three bytes wrong, but still claims all "success")
- -arch ppc7400 -O2: partial failure (six bytes wrong, correctly identified as "unclear")
- -arch ppc7400 -O3: partial failure (four bytes wrong, correctly identified as "unclear")
- -arch ppc7450 -O0: recovers all bytes (claims all "success")
- -arch ppc7450 -O1: partial failure (four bytes wrong, correctly identified as "unclear")
- -arch ppc7450 -O2: partial failure (three bytes wrong, but still claims all "success")
- -arch ppc7450 -O3: partial failure (eight bytes wrong, all various garbage bytes, correctly identified as "unclear")
Leopard clearly impairs Spectre's success, but the DLSDs do seem to differ further internally. The worst case runtime on the 10.5 1.5GHz 12" was around 0.25 seconds. The real test would be to put Tiger on a DLSD, but I wasn't willing to do so with this one since it's my Leopard test system.
Enough data. Let's irresponsibly make rash conclusions.
- The G3 and 7400 G4 systems appear, at minimum, to be resistant to Spectre as predicted. I hesitate to say they're immune but there's certainly enough evidence here to suggest it. While there may be a variant around that could get them to leak, even if it existed it wouldn't do so very quickly based on this analysis.
- The 7450 G4e is more vulnerable to Spectre than the G5 and can be exploited faster, except for the DLSDs which (at least in Leopard) seem to be unusually resistant.
- Power management makes a difference, but not enough to completely retard the exploit (again, except the DLSDs), and not always in a predictable fashion.
- At least for these systems, cache size didn't seem to have any real correlation.
- Spectre succeeds more reliably in Tiger than in Leopard.
- Later Power ISA chips are vulnerable with a lot less fiddling.
Before you panic, though, also remember:
- These were local programs run at full speed in a test environment with no limits, and furthermore the program knew exactly what it was looking for and where. A random attack would probably not have this many advantages in advance.
- Because the timing is so variable, a reliable attack would require running several performance profiles and comparing them, dramatically slowing down the effective exfiltration speed.
- This wouldn't be a very useful Trojan horse because sketchy programs can own your system in ways a lot more useful (to them) than iffy memory reads that are not always predictably correct. So don't run sketchy programs!
- No 7450 G4 is fast enough to be exploited effectively through TenFourFox's JavaScript JIT, which would be the other major vector. Plus, no 7450 can speculatively execute through TenFourFox's inline caches anyway because they use CTR for indirect branching (see the analysis), so the generated code already has an effective internal barrier.
- Arguably the Quad G5 might get into the speed range needed for a JavaScript exploit, but it would be immediately noticeable (as in, jet engine time), not likely to yield much data quickly, and wouldn't be able to do so accurately. After FPR5 final, even that possibility will be greatly lessened as to make it just about useless.
I need to eat dinner. And a life. If you've tested your own system (Tobias reports success on a 970FX), say so in the comments.