Friday, January 20, 2017

45.7.0 available for realsies

Let's try that again. TenFourFox 45.7.0 is now available for testing ... again (same download location, same release notes, new hashes), and as before, will go live late Monday evening if I haven't been flooded out of my house by the torrential rains we've been getting in currently-not-so-Sunny So Cal. You may wish to verify you got the correct version by manually checking the hash on the off-chance the mirrors are serving the old binaries.

Saturday, January 14, 2017

45.7.0 available (also: Talos fails)

TenFourFox 45.7.0 is now available for testing. In addition to reducing the layout paint delay I also did some tweaks to garbage collection by removing some code that isn't relevant to us, including some profile accounting work we don't need to bother computing. If there is a request to reinstate this code in a non-debug build we can talk about a specific profiling build down the road, probably after exiting source parity. As usual the build finalizes Monday evening Pacific time. I didn't notice that the release had been pushed forward another week, to January 24. If additional security patches land, there will be a respin. There will be a respin this weekend. The download links have been invalidated and cancelled.

For 45.8 I plan to start work on the built-in user-agent switcher, and I'm also looking into a new initiative I'm calling "Operation Short Change" to wring even more performance out of IonPower. Currently, the JavaScript JIT's platform-agnostic section generates simplistic unoptimized generic branches. Since these generic branches could call any code at any displacement and PowerPC conditional branch instructions have only a limited number of displacement bits, we pad the branches with nops (i.e., nop/nop/nop/bc) so they can be patched up later if necessary to a full-displacement branch (lis/ori/mtctr/bcctr) if the branch turns out to be far away. This technique of "branch stanzas" dates back all the way to the original nanojit we had in TenFourFox 4 and Ben Stuhl did a lot of optimization work on it for our JaegerMonkey implementation that survived nearly unchanged in PPCBC and in a somewhat modified form today in IonPower-NVLE.

However, in the case of many generic branches the Ion code generator creates, they jump to code that is always just a few instruction words away and the distance between them never moves. These locations are predictable and having a full branch stanza in those cases wastes memory and instruction cache space; fortunately we already have machinery to create these fixed "short branches" in our PPC-specific code generator and now it's time to further modify Ion to generate these branches in the platform-agnostic segment as well. At the same time, since we don't generally use LR actually as a link register due to a side effect of how we branch, I'm going to investigate whether using LR is faster for long branches than CTR (i.e., lis/ori/mtlr/b(c)lr instead of mtctr/b(c)ctr). Certainly on G5 I expect it probably will be because having mtlr and blr/bclr in the same dispatch group doesn't seem to incur the same penalty that mtctr and bctr/bcctr in the same dispatch group do. (Our bailouts do use LR, but in an indirect form that intentionally clobbers the register anyway, so saving it is unimportant.)

On top of all that there is also the remaining work on AltiVec VP9 and some other stuff, so it's not like I won't have anything to do for the next few weeks.

On a more disappointing note, the Talos crowdfunding campaign for the most truly open, truly kick-*ss POWER8 workstation you can put on your desk has run aground, "only" raising $516,290 of the $3.7m goal. I guess it was just too expensive for enough people to take a chance on, and in fairness I really can't fault folks for having a bad case of sticker shock with a funding requirement as high as they were asking. But you get the computer you're willing to pay for. If you want a system made cheaper by economies of scale, then you're going to get a machine that doesn't really meet your specific needs because it's too busy not meeting everybody else's. Ultimately it's sad that no one's money was where their mouths were because for maybe double-ish the cost of the mythical updated Mac Pro Tim Cook doesn't see fit to make, you could have had a truly unencumbered machine that really could compete on performance with x86. But now we won't. And worst of all, I think this will scare off other companies from even trying.

Tuesday, January 10, 2017

Not dead, didn't perish in an airline crash over the Pacific

Yes, I'm alive, and yes, I'm back at Floodgap orbiting headquarters. Meanwhile, candidate builds for TenFourFox 45.7 are scheduled for this weekend. Since no one has voiced any problems, the change to nglayout.initialpaint.delay mentioned in the prior post (to 100ms) will take effect. If this caused adverse issues for you, speak now, or forever hold your peace right up until you post frantic bug reports.