Well, 9 is finally up and running after some difficulty with the initial build. However, it's also having some real trouble being stable -- it crashes with only minimal use and I strongly suspect the new garbage collector. However, part of the plugin IPC stack also appears in affected backtraces (yes, with plugins off). I'm trying to debug this, but it's failing in subtle and unrepeatable ways.
If I am unable to solve the stability issue by next week, we will issue an 8.0.2 beta with the new features scheduled for 9 and decide what to do at that point. The ESR is not far away (Mozilla's extended service release, to replace 3.6), so that could give us an out. More soon.
Sunday, November 27, 2011
Saturday, November 19, 2011
Macroassembler ahoy!: more JIT in 9
I'm typing this post in a modified version of TenFourFox 8 with regular expression compilation baked into the JavaScript JIT, and I'm already sad about having to go back to vanilla 8 for comparison testing because it's already making a huge difference for the sites I use. Regexes, for the uninitiated, are concise ways of expressing patterns to match against and/or extract portions from data. They obviously predate JavaScript by many years, of course (probably the language that is most associated with them today is Perl), but they are a common part of modern JS applications and as such their performance is tested by most benchmarks. Unfortunately, the tracer-based regex JIT that Mozilla wrote for Firefox 3.6 was removed in Firefox 4.0 for YARR (Yet Another Regex Runtime, from WebKit) and its own regex JIT, so we never got the advantage of JavaScript regular expression compilation and every prior version of TenFourFox fell back on the interpreter when a regex was encountered.
Well, no more. As part of our efforts to attack methodjit (more on this soon), we finished porting the Nitro macroassembler to PowerPC based on Ben's initial hard work and this enables us to use YARR. Better yet, we can use YARR JIT and still use our heavily optimized custom tracejit right now. So it's going to be in 9. This also eliminates our dependence on PCRE to maintain our regular expression performance because we can now use the same code as everyone else.
How much difference does that make? Well, it depends on how much that's in your workload, but on the quad G5 I develop on here at Floodgap Orbiting Headquarters, SunSpider drops from 1600ms to ... are you ready? ... 990ms. Yes, kids, we're already at our target of getting under a second on SunSpider, and we haven't even implemented methodjit yet!
However, it should be noted that this is because the part of SunSpider we consistently chugged on was the regexp portion, which was almost 650ms before, and is now about 45ms. Likewise, on V8, we improve from 627 to 769 purely on the basis of RegExp; on Dromaeo, which is a fairly balanced benchmark, we make a much more modest improvement from 110 runs/sec to about 119 runs/sec.
Thus on many sites you will see little difference, but virtually all sites with significant JavaScript requirements use some regexes and you will see some improvement on them, and some sites like Twitter use gargantuan expressions which now parse considerably faster (Twitter's fat regex size was, in fact, what caused TenFourFox 5 to break). It definitely improves browser chrome performance because large numbers of regular expressions are used by the browser's JavaScript code, and these regexes can now be cached ready to go in machine code. The downside is, like all JITs, they only pay off if they are cached and so there will be some additional memory demands (offset by no longer requiring us to cache PCRE results). So it's a net win, and G3 owners will be delighted to hear that this is not limited to AltiVec -- regular expression compilation will be enabled on all versions, including G3. And the good news for builders is that we were able to hack it to correctly compile on gcc 4.0.1, so no compiler change is currently required.
The news is not as good with methodjit -- we can't even get it to compile simple code, let alone run it, even though we know that our macroassembler works (because YARR JIT works). Mozilla is suspicious that we may have unearthed a bug with register allocation and I have some contacts who are hopefully able to give us some tips to debugging the problem. I'm still hopeful for the Fx10 timeframe, which is good, because it seems there are some regressions with Fx9's Type Inference (which does not affect the tracing JIT) and it would be better to have those shaken out before we try to implement that for PowerPC.
Otherwise, the Fx9 port is so far uneventful -- I'm about halfway through the patches and so far there have been no major issues, although we haven't tried to build it yet. There is a lot scheduled for this beta, including not only YARR JIT but also faster AltiVec text processing and AltiVec colour space management (encores from Tobias), so it's taking a little longer than intended. I'm shooting for Thanksgiving weekend, and we can all give thanks for that. :)
Well, no more. As part of our efforts to attack methodjit (more on this soon), we finished porting the Nitro macroassembler to PowerPC based on Ben's initial hard work and this enables us to use YARR. Better yet, we can use YARR JIT and still use our heavily optimized custom tracejit right now. So it's going to be in 9. This also eliminates our dependence on PCRE to maintain our regular expression performance because we can now use the same code as everyone else.
How much difference does that make? Well, it depends on how much that's in your workload, but on the quad G5 I develop on here at Floodgap Orbiting Headquarters, SunSpider drops from 1600ms to ... are you ready? ... 990ms. Yes, kids, we're already at our target of getting under a second on SunSpider, and we haven't even implemented methodjit yet!
However, it should be noted that this is because the part of SunSpider we consistently chugged on was the regexp portion, which was almost 650ms before, and is now about 45ms. Likewise, on V8, we improve from 627 to 769 purely on the basis of RegExp; on Dromaeo, which is a fairly balanced benchmark, we make a much more modest improvement from 110 runs/sec to about 119 runs/sec.
Thus on many sites you will see little difference, but virtually all sites with significant JavaScript requirements use some regexes and you will see some improvement on them, and some sites like Twitter use gargantuan expressions which now parse considerably faster (Twitter's fat regex size was, in fact, what caused TenFourFox 5 to break). It definitely improves browser chrome performance because large numbers of regular expressions are used by the browser's JavaScript code, and these regexes can now be cached ready to go in machine code. The downside is, like all JITs, they only pay off if they are cached and so there will be some additional memory demands (offset by no longer requiring us to cache PCRE results). So it's a net win, and G3 owners will be delighted to hear that this is not limited to AltiVec -- regular expression compilation will be enabled on all versions, including G3. And the good news for builders is that we were able to hack it to correctly compile on gcc 4.0.1, so no compiler change is currently required.
The news is not as good with methodjit -- we can't even get it to compile simple code, let alone run it, even though we know that our macroassembler works (because YARR JIT works). Mozilla is suspicious that we may have unearthed a bug with register allocation and I have some contacts who are hopefully able to give us some tips to debugging the problem. I'm still hopeful for the Fx10 timeframe, which is good, because it seems there are some regressions with Fx9's Type Inference (which does not affect the tracing JIT) and it would be better to have those shaken out before we try to implement that for PowerPC.
Otherwise, the Fx9 port is so far uneventful -- I'm about halfway through the patches and so far there have been no major issues, although we haven't tried to build it yet. There is a lot scheduled for this beta, including not only YARR JIT but also faster AltiVec text processing and AltiVec colour space management (encores from Tobias), so it's taking a little longer than intended. I'm shooting for Thanksgiving weekend, and we can all give thanks for that. :)
Tuesday, November 15, 2011
8.0.1 not planned
Mozilla is chemspilling 8.0.1 this week, probably today or tomorrow, to cover bug 699134 and bug 700835. Bug 699134 affects only Windows, and bug 700835 only affects the most current release of Java on 10.6 and 10.7, so neither is relevant to us.
As for the 9 beta, I am waiting for a couple issues to shake out. When Mozilla marks beta 2, then we will pull and port. On deck are some more AltiVec ports from Tobias and some infrastructure changes. The methodjit port continues and can compile simple expressions, but YARR behaves badly or crashes. Ben and I are investigating this in more detail, and the possibility of forcing a compiler update to gcc 4.2 is being considered if it turns out 4.0.1 miscompiles YARR on PPC as well. Methodjit is still slated optimistically for the Fx10 timeframe; it won't be in 10.4Fx 9.
Just for fun, because I am a Forth nerd, here is a beautiful implementation of Jonesforth written in pure PowerPC assembly language. It still runs perfectly on 10.4.
Stay tuned for the 9 beta!
As for the 9 beta, I am waiting for a couple issues to shake out. When Mozilla marks beta 2, then we will pull and port. On deck are some more AltiVec ports from Tobias and some infrastructure changes. The methodjit port continues and can compile simple expressions, but YARR behaves badly or crashes. Ben and I are investigating this in more detail, and the possibility of forcing a compiler update to gcc 4.2 is being considered if it turns out 4.0.1 miscompiles YARR on PPC as well. Methodjit is still slated optimistically for the Fx10 timeframe; it won't be in 10.4Fx 9.
Just for fun, because I am a Forth nerd, here is a beautiful implementation of Jonesforth written in pure PowerPC assembly language. It still runs perfectly on 10.4.
Stay tuned for the 9 beta!
Monday, November 7, 2011
8.0 now release
8.0 is now converted to release. Watch for 9 pretty soon; the port will start once Mozilla has certified the first beta.
I note with interest that Flash Player 11 requires 10.6 -- my 10.5 Intel mini could not update to it. (I'll put 10.7 on it eventually and grit my teeth for those few tasks I have that require an Intel Mac -- mostly Android development and Eclipse. 10.5 runs happily in VirtualBox.) So in not too long we will probably see Adobe abandoning 32-bit Intel Macs as well as PPCs on Flash; all the more reason to eliminate dependencies upon it.
I note with interest that Flash Player 11 requires 10.6 -- my 10.5 Intel mini could not update to it. (I'll put 10.7 on it eventually and grit my teeth for those few tasks I have that require an Intel Mac -- mostly Android development and Eclipse. 10.5 runs happily in VirtualBox.) So in not too long we will probably see Adobe abandoning 32-bit Intel Macs as well as PPCs on Flash; all the more reason to eliminate dependencies upon it.
Saturday, November 5, 2011
8.0 RC and QuickTime Enabler alpha 113 now available
Mozilla dragged its collective feetsies on Firefox 8 and didn't finally sign off on it until late this week, which is why this RC is so tardy. However, the G5 jammed all night (which was helpful because we had a winter storm, and it's keeping the office nice and cozy) and coughed out the release candidates for you the beta audience to bang on. As usual, reports of serious bugs appreciated, but most of the bugs I know about should be corrected.
Issue 84, the irritating table malformation bug, turned out to have been caused by one of our AltiVec optimizers (specifically for text fragments). This is a curious manifestation and one we would not have found in testing. We're going to be more cautious about further optimizations on this particular portion of code, but being the rash developers we are, more optimizations are nevertheless in the pipeline for 9. A fixed version of the optimizer is in the release and the parser bandaid can go into the rubbish bin where it belongs (whew!). Note that this means this bug never appeared on G3, because G3 uses the original C code.
Tobias is continuing his work on more AltiVec-specific optimizations and faster ones of what we've got. Many/most of his optimizations will appear in 9. However, priority one remains the methodjit and I'd like to make a public shout-out to Ben Stuhl who polished off the opcode work on it and it actually parses. It doesn't work yet, but this is super awesome stuff and he gets a beer too. I've got feelers out to another heavy POWER user who has expressed interest and if this gets off the ground, we may fork js into our own internal repo to speed collaborative development and merge this back into our usual changesets. The first draft of this will appear in the 9 beta changesets, although I doubt very much I will have it working by then. However, Ben's strong work puts us with a decent chance of having this ready by 10 beta, and Mozilla has given us until 11 before the tracer infrastructure is completely excised (a thank-you to Nick Nethercote and Dave Mandelin). More about that when 9 appears.
8 final still includes Tobias' patches for AltiVec JPEG decoding with ImageIO and faster AltiVec text processing. I also made a small tweak to the tracejit for a marginal performance benefit we weren't getting, and fixed the remaining theme glitches. This should cover all the major bugs. Those of you who hated tab dragging will be relieved to note that Mozilla has bowed to complaints and backed it out, but it will be returning. My estimated timeframe is somewhere around Fx10.
There is a simmering attitude of discontent with 8 (and to a lesser extent 7) that it has periods where it can drag or temporarily grind to a halt. I personally have not experienced this, but a fast G5 covers a multitude of sins and I know some of you have indeed seen issues like that which appear caused by some interaction with the database thread Firefox uses for history. This is not specific to us; it is a Mozilla bug and if you search Bugzilla there are several open and active issues related to it. It's hard to be maintaining a port of the browser and be the lightning rod for criticism of it when a good portion of perceived performance problems is systemic and not port-specific. I point out, for example, this comment (at the bottom); we'll use him as a public example since he has chosen to make his irritation public. Even though TenFourFox is a Firefox port, and does have its own unique issues, it is still overwhelmingly Firefox, and I think people should be willing to voice their concerns about the browser to Mozilla directly if at all possible. I can't be everyone's punching bag for a port where I control very little of the code, and I don't think there's any reasonable response I could make to a guy like that from this position (though I am reminded of this blog post). Unless someone comes up with Chrome for PPC, which has a huge hurdle and I know I've talked to people about that who have asked, you're stuck with Gecko, and Mozilla is already talking about dropping 10.5. Make it your own while ye may.
With that disagreeable topic dispensed with, let's go back to goodies. I've compiled the feedback on the QuickTime Enabler, most of it good -- people love it when it works, but it just doesn't cover enough of what's available. So for alpha 113 (yes, there have been almost 70 revisions between alpha 45 [7.0] and this one), we're concentrating on expanding site support and I've started with the big one, which is YouTube.
Let's talk a little bit about the technology behind the QTE so that the limitations can be understood. QTE works by grabbing video URLs (preferably to H.264 video) and stuffing them into QuickTime playlists which it then hands to QuickTime Player. TenFourFox is not involved in downloading the video or even authenticating to get the video; it's all streamed, played and managed by QuickTime 7. Because of this, videos that require session cookies (Vimeo) or authentication will never work with the QTE because TenFourFox can't tell QuickTime how to send that information -- there is no API to do so. For Vimeo, for example, you will still have to login and download the video, which fortunately you can do for free.
That still leaves many sites that don't require session cookies or the like, and YouTube is fortunately one of them. In this release of the QTE, you can now go to any YouTube video (that has H.264 video available -- most do), right-click on the page, and select SD, HD 720 or HD 1080. HD 1080, btw, will probably chug badly on anything slower than a G5, but a fast G4 should be able to keep up with HD 720. In fact, you can right-click on any YouTube embedded iframe and do the same. It works better if you have the HTML5 trial enabled so that you can see the video instead of a blank box, but it will still work. Try it on this (rather slick) Microsoft video reported by Engadget which has all three resolutions available. Just right-click on the video box at the end of the article, choose your resolution, and Cmd-F to enjoy a full-screen experience in QuickTime Player.
Some videos can't be accessed from YouTube in this manner and you will get an error if you try to. Sorry, I can't do anything about that. I have encountered exactly two in the 50-odd videos I have tested, though. This will also not work for embedded objects -- i.e., those sites that embed the actual Flash applet rather than the iframe, which is the preferred method. I am thinking of a way around this, but it is likely to be convoluted.
This version also corrects an issue with corrupted characters in the destination URL dialogue box and improves performance slightly. Also, the issue with the video not automatically starting on some systems is in fact a QuickTime settings issue. If the video does not play automatically (give it some time to buffer, please!), go into Preferences in QuickTime Player and make sure that Automatically play movies when opened is checked.
To install alpha 113, make sure that alpha 45 is removed (it should be, it's not compatible with 8), and download the .xpi. Drop it on TenFourFox and it will install. No restart is required; you can play videos immediately.
I would like other suggestions about services to add in the comments. These services should be popular, widespread and have known ways to access their video streams. As I mention, some will not be compatible with this method -- Vimeo is the most notorious.
Well, go to it and have fun. TenFourFox 9 beta is probably about two weeks away. Release notes and architectures:
Issue 84, the irritating table malformation bug, turned out to have been caused by one of our AltiVec optimizers (specifically for text fragments). This is a curious manifestation and one we would not have found in testing. We're going to be more cautious about further optimizations on this particular portion of code, but being the rash developers we are, more optimizations are nevertheless in the pipeline for 9. A fixed version of the optimizer is in the release and the parser bandaid can go into the rubbish bin where it belongs (whew!). Note that this means this bug never appeared on G3, because G3 uses the original C code.
Tobias is continuing his work on more AltiVec-specific optimizations and faster ones of what we've got. Many/most of his optimizations will appear in 9. However, priority one remains the methodjit and I'd like to make a public shout-out to Ben Stuhl who polished off the opcode work on it and it actually parses. It doesn't work yet, but this is super awesome stuff and he gets a beer too. I've got feelers out to another heavy POWER user who has expressed interest and if this gets off the ground, we may fork js into our own internal repo to speed collaborative development and merge this back into our usual changesets. The first draft of this will appear in the 9 beta changesets, although I doubt very much I will have it working by then. However, Ben's strong work puts us with a decent chance of having this ready by 10 beta, and Mozilla has given us until 11 before the tracer infrastructure is completely excised (a thank-you to Nick Nethercote and Dave Mandelin). More about that when 9 appears.
8 final still includes Tobias' patches for AltiVec JPEG decoding with ImageIO and faster AltiVec text processing. I also made a small tweak to the tracejit for a marginal performance benefit we weren't getting, and fixed the remaining theme glitches. This should cover all the major bugs. Those of you who hated tab dragging will be relieved to note that Mozilla has bowed to complaints and backed it out, but it will be returning. My estimated timeframe is somewhere around Fx10.
There is a simmering attitude of discontent with 8 (and to a lesser extent 7) that it has periods where it can drag or temporarily grind to a halt. I personally have not experienced this, but a fast G5 covers a multitude of sins and I know some of you have indeed seen issues like that which appear caused by some interaction with the database thread Firefox uses for history. This is not specific to us; it is a Mozilla bug and if you search Bugzilla there are several open and active issues related to it. It's hard to be maintaining a port of the browser and be the lightning rod for criticism of it when a good portion of perceived performance problems is systemic and not port-specific. I point out, for example, this comment (at the bottom); we'll use him as a public example since he has chosen to make his irritation public. Even though TenFourFox is a Firefox port, and does have its own unique issues, it is still overwhelmingly Firefox, and I think people should be willing to voice their concerns about the browser to Mozilla directly if at all possible. I can't be everyone's punching bag for a port where I control very little of the code, and I don't think there's any reasonable response I could make to a guy like that from this position (though I am reminded of this blog post). Unless someone comes up with Chrome for PPC, which has a huge hurdle and I know I've talked to people about that who have asked, you're stuck with Gecko, and Mozilla is already talking about dropping 10.5. Make it your own while ye may.
With that disagreeable topic dispensed with, let's go back to goodies. I've compiled the feedback on the QuickTime Enabler, most of it good -- people love it when it works, but it just doesn't cover enough of what's available. So for alpha 113 (yes, there have been almost 70 revisions between alpha 45 [7.0] and this one), we're concentrating on expanding site support and I've started with the big one, which is YouTube.
Let's talk a little bit about the technology behind the QTE so that the limitations can be understood. QTE works by grabbing video URLs (preferably to H.264 video) and stuffing them into QuickTime playlists which it then hands to QuickTime Player. TenFourFox is not involved in downloading the video or even authenticating to get the video; it's all streamed, played and managed by QuickTime 7. Because of this, videos that require session cookies (Vimeo) or authentication will never work with the QTE because TenFourFox can't tell QuickTime how to send that information -- there is no API to do so. For Vimeo, for example, you will still have to login and download the video, which fortunately you can do for free.
That still leaves many sites that don't require session cookies or the like, and YouTube is fortunately one of them. In this release of the QTE, you can now go to any YouTube video (that has H.264 video available -- most do), right-click on the page, and select SD, HD 720 or HD 1080. HD 1080, btw, will probably chug badly on anything slower than a G5, but a fast G4 should be able to keep up with HD 720. In fact, you can right-click on any YouTube embedded iframe and do the same. It works better if you have the HTML5 trial enabled so that you can see the video instead of a blank box, but it will still work. Try it on this (rather slick) Microsoft video reported by Engadget which has all three resolutions available. Just right-click on the video box at the end of the article, choose your resolution, and Cmd-F to enjoy a full-screen experience in QuickTime Player.
Some videos can't be accessed from YouTube in this manner and you will get an error if you try to. Sorry, I can't do anything about that. I have encountered exactly two in the 50-odd videos I have tested, though. This will also not work for embedded objects -- i.e., those sites that embed the actual Flash applet rather than the iframe, which is the preferred method. I am thinking of a way around this, but it is likely to be convoluted.
This version also corrects an issue with corrupted characters in the destination URL dialogue box and improves performance slightly. Also, the issue with the video not automatically starting on some systems is in fact a QuickTime settings issue. If the video does not play automatically (give it some time to buffer, please!), go into Preferences in QuickTime Player and make sure that Automatically play movies when opened is checked.
To install alpha 113, make sure that alpha 45 is removed (it should be, it's not compatible with 8), and download the .xpi. Drop it on TenFourFox and it will install. No restart is required; you can play videos immediately.
I would like other suggestions about services to add in the comments. These services should be popular, widespread and have known ways to access their video streams. As I mention, some will not be compatible with this method -- Vimeo is the most notorious.
Well, go to it and have fun. TenFourFox 9 beta is probably about two weeks away. Release notes and architectures: