Saturday, January 14, 2017

45.7.0 available (also: Talos fails)

TenFourFox 45.7.0 is now available for testing. In addition to reducing the layout paint delay I also did some tweaks to garbage collection by removing some code that isn't relevant to us, including some profile accounting work we don't need to bother computing. If there is a request to reinstate this code in a non-debug build we can talk about a specific profiling build down the road, probably after exiting source parity. As usual the build finalizes Monday evening Pacific time. I didn't notice that the release had been pushed forward another week, to January 24. If additional security patches land, there will be a respin. There will be a respin this weekend. The download links have been invalidated and cancelled.

For 45.8 I plan to start work on the built-in user-agent switcher, and I'm also looking into a new initiative I'm calling "Operation Short Change" to wring even more performance out of IonPower. Currently, the JavaScript JIT's platform-agnostic section generates simplistic unoptimized generic branches. Since these generic branches could call any code at any displacement and PowerPC conditional branch instructions have only a limited number of displacement bits, we pad the branches with nops (i.e., nop/nop/nop/bc) so they can be patched up later if necessary to a full-displacement branch (lis/ori/mtctr/bcctr) if the branch turns out to be far away. This technique of "branch stanzas" dates back all the way to the original nanojit we had in TenFourFox 4 and Ben Stuhl did a lot of optimization work on it for our JaegerMonkey implementation that survived nearly unchanged in PPCBC and in a somewhat modified form today in IonPower-NVLE.

However, in the case of many generic branches the Ion code generator creates, they jump to code that is always just a few instruction words away and the distance between them never moves. These locations are predictable and having a full branch stanza in those cases wastes memory and instruction cache space; fortunately we already have machinery to create these fixed "short branches" in our PPC-specific code generator and now it's time to further modify Ion to generate these branches in the platform-agnostic segment as well. At the same time, since we don't generally use LR actually as a link register due to a side effect of how we branch, I'm going to investigate whether using LR is faster for long branches than CTR (i.e., lis/ori/mtlr/b(c)lr instead of mtctr/b(c)ctr). Certainly on G5 I expect it probably will be because having mtlr and blr/bclr in the same dispatch group doesn't seem to incur the same penalty that mtctr and bctr/bcctr in the same dispatch group do. (Our bailouts do use LR, but in an indirect form that intentionally clobbers the register anyway, so saving it is unimportant.)

On top of all that there is also the remaining work on AltiVec VP9 and some other stuff, so it's not like I won't have anything to do for the next few weeks.

On a more disappointing note, the Talos crowdfunding campaign for the most truly open, truly kick-*ss POWER8 workstation you can put on your desk has run aground, "only" raising $516,290 of the $3.7m goal. I guess it was just too expensive for enough people to take a chance on, and in fairness I really can't fault folks for having a bad case of sticker shock with a funding requirement as high as they were asking. But you get the computer you're willing to pay for. If you want a system made cheaper by economies of scale, then you're going to get a machine that doesn't really meet your specific needs because it's too busy not meeting everybody else's. Ultimately it's sad that no one's money was where their mouths were because for maybe double-ish the cost of the mythical updated Mac Pro Tim Cook doesn't see fit to make, you could have had a truly unencumbered machine that really could compete on performance with x86. But now we won't. And worst of all, I think this will scare off other companies from even trying.

9 comments:

  1. It's been awhile since our friend in Japan pushed out an updated build of Tenfourbird (latest version available remains 38.9.0). Has anyone heard anything about an eventual release of Tenfourbird 45.x.x, or have we reached the end of the road?

    ReplyDelete
    Replies
    1. Sadly, I have not heard from our anonymous builder in the land of the Rising Sun. I hope there is a 45.x in the future since I still use T-bird for reading the mozilla newsgroups.

      Delete
  2. I lack knowledge of the context a bit but, from the documentation about the 7450, a mtctr/b(c)ctr combination would be better use than mtlr/b(c)lr, because branch misprediction has strong chances to occur due to the link stack target address prediction for branching to the link register.

    Or, maybe your investigation includes saving the ctr before indirect branching?

    As I have not read about the 970 enough, I won't pronounce myself on it.

    It is quite disappointing that the Talos WS project has raised under 1/7 of their goal, but I can't say that I am surprised. Since I'm still at college I couldn't help that much, but I sincerely have wished for it to reach its goal. But, what I learned from today's era is that low price is far more important than innovation. Even though, I'm not close to switch to the Intel world.

    By the way, I always appreciate the time you spend to develop your mozilla port(s) ( Yeah, Clasilla is awesome too ). I think the latest version of TFFx is always the best.

    ReplyDelete
    Replies
    1. No, you may well be right. LR is usually heavily optimized in most PPC implementations, and I would use it as a return address already except we had lots of problems for reasons I was never able to figure out. That would be a bigger undertaking, of course.

      Anyway, your point is well taken: given that the only CPU to pay the mtctr/bc(c)tr penalty is the 970, changing long call stanzas to mtlr/b(c)lr might be a G5-only change. But the short branching will be part of all the architecture ports.

      Sadly, as you say, the race to the bottom with computer pricing has taught people that price tends to exclude all other considerations. :(

      Delete
    2. This comment has been removed by the author.

      Delete
  3. If we want another try of PPC personal computing projects with full respect to freedom, is providing your improved JS and JIT code to the upstream Firefox possible? A great JIT and JS engine optimized for PPC is really essential for a better web browsing experience in the current days when we want to enjoy PPC-based daily computing.

    ReplyDelete
    Replies
    1. It's absolutely possible; there is no administrative constraint. But two things need to happen:

      1) The code needs to be converted from the OS X (PowerOpen) ABI to SysV to make it useful for Linux and *BSD consumers. Not a massive amount of work, but not trivial.

      2) The source code needs to be cleaned up for Mozilla's standards. This would be, at best, tedious.

      3) The hacks I've made to irregexp and the JIT need to be uplifted.

      These are obviously all possible, but they require a lot of extra time I just don't have. If someone wants to do this, I would be happy to advise.

      Delete
    2. Thank you so much for the reply! It is glad to see a positive reply, though neither of us have time to make that happen. Anyway, it is still good to hear. Best wishes for your future development!

      Delete

Due to an increased frequency of spam, comments are now subject to moderation.