Friday, March 4, 2016

38.7.0 available (plus: thanks, Mozilla, for making the web little-endian)

As I try to recover from my annual bout of viral bronchitis, TenFourFox 38.7.0 is now available for testing (downloads, release notes, hashes). Sharp-eyed builders will note that the hash for the 38.7.0 changesets is the same as the one for the 38.6.1 changesets, and that's because no changes took place between 38.6.1 and 38.7.0, so they're identical. That was good because I could just let the automated patchers build and package this version and I could sleep most of the last two days. As usual, it will become official next Monday evening (more or less), Pacific time.

I finished my Master's coursework on Monday, so now the assault on 45ESR begins in earnest. I forecast about a 70% chance of success since we have a working gcc 4.8 from MacPorts which built 38 successfully and Electrolysis and Rust are not mandatory in 45. 45.0 comes out with 38.7 next week, 45.1 comes out on 19 April (with 38.8) and 45.2 comes out on 7 June, marking the end of support for 38ESR. However, I need to have basic working versions of 40, 41, 42 and 43 first, which I already have local trees for, so I can do regression testing on bugs that crop up in the testing phase.

Ordinarily that would still be ample time since this is more or less a straight port. The complicating factor is me getting married -- yes, folks, I'm off the market to a lovely Australian lady -- which will take about four weeks out of the eight to ten weeks I'll have to complete it (there's a total of twelve-ish weeks in there but I'd like at least two weeks for the localization and for you lot to test it, and even that will be tight), so if needed there will be an unofficial 38.9 to buy us another six by backporting the security patches for 45.2.

Current plans for 45 are to add basic user-agent switching support to the core browser (since TenFourFox will fork with the end of 45ESR support) and expose turning off PDF.js in the UI. Under the hood, I would like to implement some PDF.js fixes and implement a true JavaScript Intl library instead of the hacky emulation we have now by getting ICU working properly. I might also do something about bringing a gimped newtab page back since a lot of people missed that functionality in 38. These changes can be introduced later in the 45 time frame and are not necessary for launch, but the strings will be present in advance in a special tenfourfox.dtd so that localizers can have them ready by the time the features are introduced.

In addition, I've been working on spec plans for IonPower-NVLE (Non-Volatile Little-Endian), which is a moderate-scope refactor of our IonPower PowerPC JavaScript JIT compiler to ("NV") make the IonMonkey allocator only use the ABI non-volatile registers, and to ("LE") make all typed-array access little endian using the standard PowerPC byteswapped load and store instructions (we already use these instructions in the big-endian port of irregexp, so they are well-tested for our purposes). The NV portion reflects that our code generator basically follows its own ABI rules within Ion-generated code but has to thunk back to the OS when calling library routines, which requires saving and restoring all the volatile registers since the Ion code generators generally use those first. However, we have substantially more registers than any of the Tier-1 platforms, so we can do better. By forcing the IonMonkey allocator only to use the declared non-volatile registers, we only have to save and restore them in the trampoline when Ion code is called initially, substantially reducing the overhead of OS calls -- especially within code generated by the Baseline compiler, which we are tuned to enter relatively earlier than other platforms. Plus, with volatile registers now being freed between calls and no longer known to IonMonkey, we can use them with few restrictions for better ILP in complex local code blocks we emit internally from the portions of the JIT we completely control.

On the other hand, the reason why we need the LE portion is significantly more frustrating philosophically, even though it seems simple. Thanks to asm.js and its byte-level access using typed arrays, lots of sites like Faceblech and WhatsCrap are Emscriptening their way to performance by turning a little-endian binary into (fast) little-endian JavaScript. Even if the library or code block they were compiling can be built big-endian, since it's invariably being built on a little-endian x86 system Emscripten dutifully poops out little-endian asm.js on the other end and that's what we end up (unsuccessfully) executing. We've already gotten bitten by this in several places and it is reasonable to expect it will happen more and more often; Tobias proved that the endianness of the code was responsible by doing an analogous change for Leopard WebKit which fixed the problem and we need to do the same. I will add typed-array byteswapping to the interpreter first and get that working against the test suite and the core browser, and then add it to the JIT. As a side effect, since we are now effectively little-endian as far as JavaScript applications are concerned, it may be worth looking into the feasibility of native asm.js on PowerPC again later down the road.

But I'm pretty irked that the upshot of all this is a little-endian web. We can't rely on even non-WebGL JavaScript to be system-agnostic anymore and that seems like a pretty badly broken promise about how architecture-independent the Web was supposed to be. I realize this outcome wasn't at all the intention when Mozilla introduced it, but by making simple tools to turn compiler output into ready-to-use asm.js that's what was inevitably going to happen, and now that developers are finding asm.js too tempting to ignore, that's exactly what is happening. Overall, minority platforms like ours are about to get even more marginalized on the Web without rather drastic steps like this one (Firefox on PowerPC/SPARC Linux and BSD, take note), so look for testing builds with IonPower-NVLE probably by the end of this year.

That said, that sexy POWER8 workstation can run little-endian. I'm just saying.

17 comments:

  1. Congratulations on finishing the Masters course.

    And congratulations on the upcoming nuptials! May you both be very happy.

    Many thanks for all the work.

    ReplyDelete
  2. Making the Web little-endian was totally the right decision. asm.js developers don't have to worry about endianness issues, and instead a few compiler developers (including you) deal with this once and for all.

    The only alternative would have been to not specify endianness for array buffers, which would put the burden back on Web developers, who would promptly assume little-endian anyway and you'd be as badly off or worse.

    ReplyDelete
    Replies
    1. But array buffers were already system-endianness (for WebGL), so that wasn't really the problem. The problem was that asm.js suddenly made JavaScript care about the underlying layout of memory by imposing byte-level load-store semantics, and tools like Emscripten made it worse by not even building on big-endian platforms at all.

      Sure, we can adapt, but the real victims are the big-endian platforms caught in the middle, like, say, Xbox 360 (see dherman's analysis on this which was prescient on the problem almost four years ago), which is also WebGL-capable. Since Apple is so lazy about OpenGL and only a minority of Power Macs under 10.5 are capable of 2.0, we (TenFourFox) can say we don't support it and lose pretty much nothing because it wouldn't have worked in most cases anyway, so we just unilaterally change JS since only JS cares. Dave concluded the same. But the 360 has to give up one or the other: it can either do WebGL, but not do little-endian asm.js-based code, or it can run LE asm.js-based code but break (or disable) WebGL. And you have to make that choice even if you don't have an OdinMonkey backend because asm.js code will happily run in the interpreter, too. So it really is asm.js that caused this situation; array buffers are just the mechanism.

      Please note I'm not saying as a practical measure that it's not a good thing the issue is forced. (I'm not saying it's a good thing either, but I can see why it's not a bad thing.) But even if breaking the promise of Web interoperability and platform independence turned out to be a net win for current developers and most present-day users, it's still a broken promise.

      On a personal note, though, as a fellow brother in Christ and having seen your name in Mozilla stuff for as long as I've kicked around the community (getting on a decade-ish), I wish you well in your next endeavour though. I'll be flying through Auckland to visit the fiancee very soon (Air NZ has nicer seats than Qantas).

      Delete
    2. "breaking the promise of Web interoperability and platform independence"
      ... well I wouldn't say this is that.

      Thanks for the best wishes. If you ever stop in Auckland I'd be glad to meet up with you. And congratulations to you!

      Delete
    3. Thank you, sir! Might take you up on it in the near future. :)

      Delete
  3. Is it possible to run both Big-endian and little-endian at the same time within a single operating system on a computer?

    ReplyDelete
    Replies
    1. In a general sense, you could argue that 10.4-10.6 with Rosetta do just that already through emulation (they compile big-endian PowerPC code to little-endian x86 code and run that, doing the byte-conversion on the fly).

      But if you consider emulation and dynamic recompilation cheating and you want to do this on the metal, you'll need (at minimum) a CPU that can mark tracts of memory with which endianness is in use and an OS that understands that setting and how to manipulate it. To make it not suck, the CPU should also be able to handle multiple execution streams of differing endianness and the OS should know how to translate calls between processes that differ in endianness. Some CPUs, including some PowerPC ones, do support per-page endianness, but they need the OS support to be useful and this is much more difficult to work with. Nowadays emulation is so much more convenient that people just throw CPU at the problem if they really have to deal with this situation.

      Delete
    2. VirtualPC did this on the G3/G4. This caused a lengthy delay when porting it to the G5, because the G5 does not have a little endian mode (being derived from POWER4)

      Delete
  4. Mega-Congrats CK on the both Cool 2016 Events (or 3 if we count the Power8 ;0)

    Honestly, the endian-issue is one of those things that Apple should have thought about when taking PowerArch to the mainstream. On a dedicated server, who cares? As long as it does what it needs to do, but cross-platform demands a high-level of compatibility.

    At the very least, they should have instituted hard-ware endian correction (like Hitachi's SuperH), and on the other end, they might well have gotten in on ARM much earlier on. It had L.E. and also was pretty well-seasoned even by the time they switched away from 68k. The notion that RISC goes hand-in-hand with B.E. has really held them (and Spark also) back. You are truly taking on the world with this last bit...swap! ;0)

    ReplyDelete
    Replies
    1. Well, that's hindsight speaking. While we classically associate big-endian with big-iron (like SPARC, PA-RISC, etc.), remember that the 68K is big-endian and appeared in lots of applications, and no one cared about that. In addition, keeping the same endianness probably made the 68K-PPC transition somewhat less complicated. The TMS 9900 series was also a consumer oriented big-endian design, obviously much less successful, but still in the same market.

      SuperH is really in the same situation as ARM: it can run bi-endian, but virtually everything runs it little. ARM had a few big-endian applications early on but now it's exclusively little too, much as MIPS has become.

      That said, ARM7 was the contemporary for the 601, and the 601 was beefier, faster and had IBM backing it. I think choosing ARM before portability and performance/watt became market factors would have been premature at that time. Plus, don't forget that while Apple wasn't there for the original ARM chips, along with Acorn and VLSI they were the original founders of Advanced RISC Machines and they still retain almost 15% of the company.

      Delete
    2. I knew that ARM was nearly as old as x86, but didn't realize Apple had a hand in it that early on. I, like a few others, assumed they were grabbing it up to hedge the mobile market, and possibly take it desktop if they ever grow weary of Intel.
      SuperH is a pretty unique case. It was fully bi-endian in the mid 90's and was amazing as a game-console platform. I used my Saturn and Dreamcast a lot back then and in the case of the later, actually used it to register for classes in college through the old PlanetWeb browser they made. Was as fast as any of the PPC (beige) Macs we had at the college and even had Flash 4.0 and early Javascript performance that was really not bad (considering the Dreamcast, in particular, only had 16MB of RAM). I often wish it wasn't relegated to automotive computing today.

      Delete
    3. Yes, the DC is pretty awesome. I'm impressed how much wear you got out of PlanetWeb, and it still plays really great games. But it's unfortunately the best example of how SuperH's bi-endian abilities are ignored: *everything* runs little endian on it, from the native shell to Windows CE apps to Linux to NetBSD. In fact, you probably couldn't use the onboard hardware if you tried to run it big.

      Delete
    4. You are right, it was all PC ports (but they were GREAT Ports). Had the mouse and keyboard that came with Quake III and the web on it was pretty nice. Actually heard my first ever MP3 through that browser - played it in window!

      Just a super-quick question - if you have a moment to tell me what you think. Am researching putting an Sandforce SSD as a dedicated virtual-memory (swap-disk) to mitigate my 32-bit/2GB limitation in Tiger. Have worked out most of the issue with getting it to mount on startup (thanks to working mostly in Linux these days), but had wanted to ask if you can see any problems with this?

      Since these don't need TRIM had hoped to simply use it as a system disc, and do routine backups, but ran into a catastrophic disk failure on the second day (after had to do a forced-restart). OWC says this is really rare and they are happy to exchange it, but it got me thinking that maybe I could take the stress off it, and still have most of the benefits if I set it up as only a VAR disk.

      BTW, while it was working, was the BEST PERFORMANCE I've EVER HAD ON ANY COMPUTER! TenFourFox open with 54-tabs, Two 6000x9000 pixel images in photoshop doubled in size by 10% increments (total of 11GB of Virtual Mem being used) and at worst, 1/10th second when moving between apps or Photoshop save-states (right up till the moment it died). OWC says these should be as tough as regular HDs. Did I get a dud? Or should I try the above approach?

      ANY thoughts on this greatly appreciated.

      Delete
    5. Getting an SSD for your PPC?? Here is some important info I found out that might help.

      As stated above, I just got an OWC 'TRIM-Free' ssd for my MDD DP 1.42. At first, this was easily the best performance I've EVER had on this machine - OS9 level peppiness even running Tiger with heavy apps like TenFourFox.

      BUT it all came crashing down in less than a day.

      The drive became completely unresponsive and I could not get any further than boot. Thought: "Knew this was too-good to be true!" Well OWC tested the drive and reported back it was fine!?!?!? To be on safe-side they sent another knew one and adviced me to attach to newer machine (Linux in my case) and verify it had newest firmware before using. Well this meant having to partition to MBR so Linux could see it. Well, once it was verified, I repartitioned back to APT to put in the Mac, but suddenly it was unresponsive again like the 1st one! Well to use the parlance of our time - WTF?

      So it was time to get me some edumacation into this stuff. The reason Sandforce controllers don't need TRIM is that they do it themselves when the drive is idle. On a modern system, with copious amounts of RAM, the only time the 'Garbage Collection' function is noticeable, is when large numbers of blocks are being reclaimed. In older OSX systems with 2GB RAM limits, this becomes much more likely than in newer systems.

      So what to do? In this case, the drive was 120GB (for $64 = Good Deal), I had initially partitioned it into 2 sections (80GB for OSX and 40GB for OS9). On a light day, maybe this would be fine, but on those 'Heavy-Flow Days', I can easily push 20GB or so onto VM, so I either plan on allowing for periodic down-time or give the drive all the room it can support to enable maximum paging flexibility. The second idea has been great and no more problems. Also, some have said that with Sandforce's drives this also makes sense in wear-leveling, because the more of the drive is available, the more it can spread the data around, and the drives also auto-recopy data periodically to make sure it stays fresh.

      Boot-times/program load times aside, one of these SSDs are the best investment you can put into your classic PowerMac. Like having virtually limitless RAM. But you need to allow it more open-space to auto-maintain (for G4/32-bit systems at least 40GB).

      Also while they do still offer the 'Legacy' IDE/ATA versions, there is no reason to pay the extra $40 when an IDE/Sata adaptor (at least if you are on a desktop with the room inside) like this one http://www.ebay.com/itm/Pata-IDE-To-Sata-Hard-Drive-Adapter-Converter-3-5-HDD-DVD-Parallel-to-Serial-ATA-/171424564491 is available for about $6 and works like a charm.

      Happy PPC Computing Folks!!

      •• Note on IDE/SATA adaptors: The smaller inline ones like mentioned above, generally have a 2TB limit. Larger drives often require a PCI card. Also, they sometimes add an additional 1-second delay to Access/Spin-Up times. Once data starts moving, there is no delay, but if your only drive is an SSD, it might make sense to experiment disabling 'disksleep' on pmset in Terminal. Even in this case, however, there will occasionally be a momentary searching during bootup for the system folder as the card comes to life. This is normal.

      Delete
  5. me getting married -nice- I wish you a happy marrige and a strong reationship with your wife !

    ReplyDelete

Due to an increased frequency of spam, comments are now subject to moderation.