Saturday, April 27, 2013

A scrollbar too far (or, why Apple can't kill a Snow Leopard)

One of my perpetual sources of anxiety is whether when Mozilla will drop support for 10.6 Snow Leopard. We've done pretty well keeping 10.4 and 10.5 compatible code in our widget library, and it looks like the code landing for Australis (the new "look" for Firefox) will be compatible with only moderate tweaking at most so far. Because Snow Leopard is essentially a tuned 10.5 and has relatively few UI-related changes, our now fairly streamlined widget library code functions quite well, considering, on all three operating systems (essentially forward-porting the 10.4-specific stuff from 3.6 and the 10.5-specific stuff from 16, for those weirdos in the audience running TenFourFox on Rosetta in 10.6). There haven't been substantial changes to our widget patches since 19, and even then only for the compiler update.

But then of course there's 10.7 Lion, Steve Jobs' Operation Market Garden, where scrolling was screwed and Rosetta was removed and Java was jettisoned (wait, that's a good thing) and scrollbars were iOSified and Save As was deprecated and you can't tell if an application has quit or not. And like 10.6 left behind an entire generation of Macs -- every Power Mac ever made -- 10.7 left behind every 32-bit Intel Mac, and 10.8 won't even run on those 64-bit Intel Macs that can't boot a 64-bit kernel, including my 2007 Core 2 Duo Mac mini which I use for taxes and Android stuff. 10.8 does smoothe out some of the rough edges of 10.7 but doubles down on them in other respects, and why bother downupgrading to 10.7 if I can't upgrade to 10.8 as quickly as possible thereafter? I could boot 10.7 on my mini, but then I lose all the features I need from 10.6, I can't upgrade to 10.8, and I have no apps that demand either one. So 10.6 stays.

In that context, then, I suppose it's not surprising that Mozilla (and probably Chrome as well) has a problem: 43% of their users are on 10.6, an OS that Apple is supporting pretty much in name only right now and will almost certainly cease to after 10.9 emerges (expect word on that from the incredibly oversubscribed WWDC). And why is this? Well, look at the other two numbers: while 10.8 is growing, 10.7 is at 30% and dropping. I can't be the only one who made the calculation that it's better to stay on 10.6 if you can't go to 10.8, and there are still lots of 10.6-only machines in use that really aren't that old. Since Apple isn't maintaining Safari anymore for 10.6, it's alternative browsers ahoy.

Unlike previously premature attempts to kill off platform support on Mozilla's part (ahem), there's a big operations problem brewing here. Apple is obsoleting hardware faster and faster, and Mozilla can't buy more 10.6 or 10.7-capable systems because Apple won't make them. (Used hardware is not generally acceptable for the types of testing they need to do; their Mac mini build farms work very hard.) They might be able to build on a later version of OS X (but not for much longer?), but they wouldn't be able to test how well it works on earlier ones. So they're going to repurpose the 10.7 test systems for 10.6, since it continues to be the majority of Firefox's OS X user base, and make 10.6 and 10.8 the major test platforms until Apple finally has a developer beta of 10.9.

All this is good news for us, of course. On the dark day when Mozilla makes 10.7 the supported minimum, all supported Macs will run a 64-bit build, have hardware acceleration, and use the new UI and related libraries -- none of which is true for us (we still use some Carbon code, limiting us to 32-bit even on G5, we don't have hardware acceleration, and our scrollbars are actually visible and scroll in the correct direction). At minimum we would have so many changes to pull forward that we would almost certainly have to split off separate widget and theme subsystems, and merging work would increase significantly if we could get it to function at all. But with almost half their users on 10.6, even considering that the number is slowly dropping, I'm now not anticipating this will occur until at least the next ESR and the chances of a working TenFourFox 31 have just increased big time.

Wednesday, April 24, 2013

IonMonkey PowerPC phase 1 complete

Tonight after several weeks of feverish work I finished the basis for IonMonkey PowerPC, the next-generation JavaScript JIT we will be implementing, including bridging our previous assembler (and all Ben's hard work on optimizing branches) to IonMonkey and then writing up the new macroassembler. IonMonkey is all stack, all the time, so there are quite a few differences in implementation (though to the Linux readers in the audience, this may make our IonMonkey backend easier to port to Linux PPC than our current JaegerMonkey/methodjit backend because I can no longer make certain assumptions about how the stack is organized). The old macroassembler still remains as an integral part of YARR and is still used to compile regular expressions.

Phase 2 will be getting it to compile, phase 3 will be getting it to do simple operations, and phase 4 will be getting it to pass the JIT test suite (which if it does means there is an excellent chance it will "just work" in the browser). Please note I am still unconvinced about how well it will perform, though the front end for IonMonkey can optimize code a lot better than JaegerMonkey could, and because it does everything on the stack I hope our 1GB allocation is enough because I don't think we can get any more out of it in 32-bit mode.

Mozilla has at least not laughed uproariously and said "no" with a bone-crushing sound to the plea to keep JM+TI (JaegerMonkey + type inference) in ESR24, even if only as a compile-time option. There is no SPARC ionjit (nor does it look like there will be one), nor MIPS, so those architectures will also lose JIT support when JaegerMonkey is removed, and since it appears that JM+TI's continued existence does not immediately impair work on certain pieces of the ionjit baseline compiler there is less push to remove it right away. This was not the case per se with tracejit back in the ESR10 days where tracejit actively impaired work on type inference. I really don't want to have to unload ionjit on the user base right away; I'd rather have lots of cycles to get it right, and keeping 24 on JM+TI means we have one more full stable release lifecycle on a very dependable engine that's bought us a lot of mileage and plenty more time to get Ion to that same level.

Tuesday, April 23, 2013

FirefoxOS dev phones sell out

By which I mean they're out of phones for sale on the very first day, and I blame you lot for why I can't buy a Peak Developer Edition. The specs are modest, but the price is eminently reasonable. Last night at 10:30pm the store was down for maintenance, I get up at 5:45am today to get one, and bam.

I'm a longtime Android user, driven to the Nexus One when Apple dropped 10.4 support within the iOS 4 timeframe, and while I don't love it I certainly don't hate it. My Galaxy Nexus is a solid device. Still, when I got to play with a Palm pre 2 (thanks, Ed!) for a period of time it was a delightful machine, and I'm hoping FirefoxOS recaptures that ease of development and feeling of openness that webOS had. Bugs are to be expected in early dev releases like this, and a friend of mine who put a nightly on a castoff phone a couple months back reported it didn't even do cut and paste (!), but I expect it has now developed to the point where you can dogfood it and I certainly hope Mozilla is. If FirefoxOS launches well, it would do much to preserve their relevance and a freer alternative mobile platform in a post-PC era.

Friday, April 19, 2013

IBM's getting out of the x86 server business

The Wall Street Journal reports today that IBM is exiting the x86 server business, citing poor margins, and selling it off to Lenovo who previously bought their ThinkPad line when IBM got out of the consumer market.

IBM, like Apple, has never done well in low-margin markets. While they were just as enthusiastic about consumer PC sales during the 1980s and early 1990s, and their ThinkPads were some of the best laptops ever made (even if they did have x86 chips in them), when personal computers became a commodity IBM saw the writing on the wall and got out. Now the same thing is happening with servers: relatively cheap Intel and AMD CPUs running Linux are easy to deploy and easy to purchase, and it's not worth the R&D and deployment costs for the shrinking margins they bring in to keep them in the product mix.

What does this mean for PowerPC? Well, it means IBM will continue to develop and improve the server-grade POWER architecture since it'll be shortly the only architecture (save z/Architecture, though the modern versions of those chips have many underlying similarities to POWER) that they sell. Whether this trickles down to the embedded and game markets is another story; PS4 will be AMD-based and the Xbox 720's Durango CPU will also be x86 and appears to be another AMD design also, leaving the Nintendo Wii U and its Espresso CPU (developed by IBM) as the only PowerPC-based console. It's not at all clear whether IBM's going to do any more development in the gaming space, but Freescale is still chugging out embedded PowerPC CPUs and recently introduced the QorIQ AMP 64-bit e6500 series with clock speeds through 2.5GHz and as many as 12 cores on a 28nm process, the linchpin of Freescale's new base-station-on-a-chip line for mobile broadband.

So I think it's good news for the architecture and brightens the future for POWER because it allows IBM to focus more on the major architecture that has consistently made them money. Confidently look for more PowerPC chips sneaking into more pieces of your daily life, including uppsala, the big POWER6 in my server room that serves you floodgap.com. Say hi!

Tuesday, April 16, 2013

"Twenty-one. That's blackjack." "Hit me."

Like any idiot who gets a natural 21, I just had to take another hit. 21 was ported relatively uneventfully. Some of the underlying work for Australis, the new Firefox UX, is in this version (invisibly), and they ported pretty much as is to 10.4, so that's a good sign for the future. I'm still watching some of the later workbugs but I don't see them landing until at least Fx24 (Fx23 is the current nightly), and Australis isn't going to be in place in full by then. Other than breaking issue 82 again, which was a printing bug introduced by an incompatible fix way back in 6.0 and is trivial to back out, the browser appears to basically work in debugging mode. getUserMedia still functions fine, we're starting to get into that curve back to where we should be with JavaScript performance, and there continue to be improvements to the graphics stack which make animations and display smoother. Later this week once I've done a couple other bug investigations, I'll flip it over and build optimized for further testing, another version in the can relatively painlessly for a change.

But I just had to, had to, take another hit with the king and the ace showing. The "hit" is something called jemalloc, which is a improved memory allocator with lower overhead: instead of asking for little tiny allocations from the operating system, jemalloc creates "arenas" of larger memory blocks (assuming that more allocation requests are following) and then parcels those out with a faster internal routine. It also scales better between threads by keeping multiple arenas in play so they don't have to contend with each other. Certain kernel-level operations and multithreading are not well optimized on 10.4 as issue 193 demonstrates, and anything that reduces the amount of locking and waiting for kernel resources is clearly a benefit because of Firefox's increasing dependence on threads for multicore systems. You can read about the gory implementation details here.

Firefox works just fine with jemalloc disabled (since it is intended to be mostly transparent), and that is how we've shipped TenFourFox so far. (Near as I can determine, AuroraFox and SeaMonkeyPPC aren't using it either, or it's not actually turned on.) Well, 10.4 must have a really crummy default allocator, because after some fiddling to account for operating system differences, I was able to slot it in and WOW! the browser not only starts up apparently normally, but is noticeably faster. Most of the deadlocked sites we're tracking in issue 193 are up to 25% faster in wall clock time compared to the non-jemalloc 21, which is already itself faster than 20. Sites with less contention are less improved, of course, but it's an improvement right where we need it. Even if it doesn't fix the actual underlying issue in the kernel, it eliminates another source of contention, and that's enough to get us through the hump. 10.5 is considerably less affected due to kernel improvements, but it could still benefit as well.

But, and here's the "busted" part, major parts of the browser's interface to OS X are screwed up when using jemalloc as the allocator. While menus, widgets and gadgets all work, cut and paste doesn't work, minimizing the window doesn't work, and drag and drop doesn't work (they don't do anything other than log an error to the system console). If I rebuild the browser with jemalloc statically disabled, they start working again, so that's the problem. That's not shippable no matter how much faster the browser is, and there is at least one crash bug related to drag and drop on Intel 10.5 that caused Mozilla to disable jemalloc on anything less than 64-bit 10.6. The PowerPC kernel might not use the same code or crash in the same way, but right now I can't even test it.

I'm suspicious that memory alignment is the problem and it's stomping on some sort of internal memory move routine, but it's going to take a while to debug it and IonMonkey is still highest priority, so this is going to slip to Job 2. Still, look for it soon once I get IonMonkey into a working state (the interested can watch issue 218). 21 is an improvement over 20, but if I can get jemalloc off the ground, maybe we'll be able to say blackjack with 22 or 24. Never bet against the house!

Tuesday, April 9, 2013

20.0.1: A Firefox Chemspill Odyssey We're Not Releasing

As strains of Strauss play and HAL murders spacemen and men dressed as apes fling bones that turn into spaceships in geosynchronous orbit, Mozilla is chemspilling 20.0.1 for two Windows-specific bugs that do not appear to be issues for us. Therefore, we will not open the pod bay doors release a 20.0.1. If you know differently, please advise. Mind the explosive bolts.

Meanwhile, the port of 21 has begun, and I'm about 50% done with the IonMonkey macroassembler. More about this later.

And pay no attention to that big black monolith. It's just a Macintosh TV turned upside down and buried.

Wednesday, April 3, 2013

Blink, there's a new HTML rendering engine

The Web is all abuzz about Blink, Google's fork of WebKit, for use in Chromium and Google Chrome (and, apparently, Opera as well, since it will track Chromium).

I've already said my piece on WebKit, but Blink significantly changes that dynamic. In fact, Blink is likely to completely fragment what would have been a cold WebKit-only future because it is almost certain to evolve and implement new features faster than WebKit will, and exported to everywhere that Google code runs (Android, too). And, well, that's good news in a sense because it avoids one kind of perilous future, but it's bad in another because there won't be much of a brake on Google implementing features in Blink to make Google properties run better or even exclusively in it. Remember, that's the whole reason they made Chrome in the first place, and Microsoft isn't the only one that "embraces and extends."

Maybe it'll be Blink that eats the Web, not WebKit after all.

Tuesday, April 2, 2013

20.0 available

20 still has some glitches in it, but I think it's far enough along for people to bang on. The big changes in 20 are the new download interface triggered by that giant "arrow" in the toolbar (in fact, if you downgrade to 19.0.2, it "sticks") and official support for getUserMedia, which we now support, too. I'll talk about that in a bit. Downloads and release notes, my preciouseses. The fix for IndexedDB offline storage databases affected by issue 213 is included, but read the release notes first!

The glitches relate to two things: I need to patch our 10.4 accelerated canvas to support the new canvas compositing operators (should be straightforward), tracked in issue 216, and we still have some JavaScript regressions to shake out (issue 215), particularly in v8-crypto. However, these regressions already shipped in 19, and y'all said that was faster than 17, so you have no one to blame but yourself! (And, uh, me for not keeping an eye on the numbers.) David Anderson from Mozilla has given me some ideas where to look, and the good news from the AWFY grid is that even if I did absolutely nothing the numbers would get close to where they were before (remember, we are the red JM+TI line on those graphs, and 20 comes from that period of time where there is a "bump" in the line around December 2012). I reverted the most painful change for this version, and I'm attacking the other changes next. These backouts are being exported as stand-alones so that if I get IonMonkey working we can just jettison them, since this is merely temporary right now for the undesirable situation where we need to live out our lives in a pain multiplier the methodjit/JaegerMonkey compiler for JavaScript acceleration.

(Sharp readers may have noticed I've said very little about SunSpider times, although they are similarly affected. That's because I'm doubtful Mozilla is optimizing much for them anymore and they are now practically microbenchmarks on modern CPUs. What benefits the V8 benchmark does not necessarily benefit SunSpider, but I'm not sure that the latter is necessarily salvageable at this point. We'll see.)

Video and audio support through getUserMedia is, as previously mentioned, fully supported for the moment. NB: I haven't decided if this will be supported for certain in TenFourFox 24 -- it depends on whether Mozilla or Google guts our QuickTime C API support along the way. For now, it is. I tested it with my iSight FireWire camera, which worked, and my Orange Micro iBOT, which didn't. Please try it with your Mac's devices on the test page and post your experiments to the video support wiki page. I'll make reasonable efforts to support devices that are not technologically constrained, but no guarantees that your favourite device will be one of them, of course.

On the future technology and port-killer front, Mozilla is not stopping their JavaScript work with the IonMonkey JavaScript just-in-time compiler, which is why it's so important for me to finish (hey, looking for a project? my partial work on IonMonkey is in our 20 changesets, which you can help with!); their next frontier is OdinMonkey, which makes we wonder where I can get some of the hash they smoke in these Mountain View naming meetings. OdinMonkey is not another JIT, which is good, because if they were already working on the next one I would be throwing in the towel here. Instead, OdinMonkey is the overtestosterated Norse ape god incarnation of asm.js, an attempt to make a minimal subset of JavaScript that can be distilled down to very fast code.

Our interest in OdinMonkey is not to be able to write the next 3D engine, because we don't have WebGL. However, other applications are being compiled for it using the Emscripten toolkit, which takes LLVM opcodes and emits equivalent JavaScript compliant with asm.js/OM, and I don't see any reason why we shouldn't be able to run these too. It should be possible to really rev performance since there is much less code complexity involved, and if they add the SIMD features they're promising we can finally implement AltiVec acceleration in JavaScript (sorry, G3 owners, you'd get equivalent slower code here). However, I can also smell serious endian hell coming from this when someone writes code assuming little endian byte order and we may need to do once and for all what I considered long ago, which is to hide our native endianness from JavaScript and make it look little-endian from the code perspective (using instructions like lwbrx, which do work just fine on G5, by the way). But to make the most of it we'll need IonMonkey, because I'm sure the really awesome optimizations will build upon it.

The port for 21 will start with beta 2, as usual.

Monday, April 1, 2013

Announcing a new TenFourFox port

The Intel fork is still being looked at by Claudio, but now I can finally take the wraps off a project that I've been kicking around for awhile and only recently figured out the means to make it happen. You already know that I have an interest in old Macs being able to run browsers, which is why I maintain TenFourFox and Classilla. But other than mid-range versions of Internet Explorer and Netscape, 68K Macs were out in the cold. This was not acceptable and with a bit of thinking and a lot of hacking, I believed it was possible to get TenFourFox mounted and running on these venerable machines. And, for the first time, I'm unveiling a way for your old Macs to join the modern web and HTML5. This is how I did it.

First off, it was clearly advantageous to have the project run on System 6. Not only did this have a coolness factor that System 7 lacked, but it meant we could take over the entire machine (assuming MultiFinder was off), we could install runtime patches to implement our own paging system and deal with the smaller screen resolution without interfering with other apps, and finally we could greatly increase the low end of machines that could run it.

My first thought was to get it running on a Mac Plus, but even the maximum RAM of 4MB available to the 68000 Macs rapidly proved inadequate in some back-of-the-cocktail-napkin calculations despite using the top eight bits for doubled-up tag storage. So we went with a full 8MB allocation, meaning that a 68020 is required. Sorry. But this means everything from the Mac II on up, so I think it's still a pretty good selection of compatible hardware (and accelerator cards will be supported for 68000 machines).

The second problem with approaching the 68K port was how to actually compile it. Our choice of System 6 meant MPW and CodeWarrior were both out of the question, and THINK C doesn't have C++ support, so I settled on Symantec C++ 7.0 since it still works on System 6. However, owing to its age, Symantec C++ only supports a (by modern standards) drastically reduced set of the language and none of the gcc extensions that we depend on for the ordinary TenFourFox, and System 6 doesn't have dynamic linking. Also, there is no xpidl compiler system on the classic Mac OS for anything other than CodeWarrior (Classilla uses it to build its XPTs and headers), so it can't build the interface files. And then there was the issue of the compilation and linking completing sometime in this century.

So I attacked the problem from reverse. The G5 already builds XPTs and header files as part of the compile, so I pulled them from the 20 test build. Then I wrote up a little preprocessor (sort of like ansi2knr for those of you who remember what that was for) to deal with Mozilla's liberal use of SFINAE, equivalent code for gcc extensions, and commenting a few things out that were inessential and could not be built. (For example, no plugins.) GNU Makefiles were simply turned into C++ projects. Finally, to deal with the shared libraries, I wrote a dummy main() and built an "executable" with debugging symbols on so that symbols could be "exported" (more about that in a second) from libxul, since Mozilla doesn't support non-libxul builds and I couldn't think of a better way to do it. I dragged the spare quad G5 out of the closet and stuck Basilisk II at it on full speed at the lowest refresh rate with as much memory as it allowed, installed an OS and Symantec C++ and copied the files over, then built each project step by step as I worked on other things. What with manually punching OK and Cancel, occasional compile errors and the emulator overhead, it took about two weeks all told to finish the compilation (I really need to get build automation working, but this was a demo project, after all), but it did finish. The browser didn't really stand up at this point, but the debugger proved it was at least generating proper code.

The next two things to write were the dynamic linker and the JIT. Yes, there's a 68K JIT too. In fact, it was somewhat easier than the PowerPC JIT since the 68K has a frame pointer and it's a little less finicky about the stack (but since the 68K version is patterned after our PowerPC version, there's probably a fair bit more optimization to be done). Testing it was a little tricky in the debug JS shell, but that's included too for you to try. The dynamic linker scans the libxul "executable" at the beginning and generates either direct addresses to call or a stub routine for a user-level page fault for those functions that were being exported. This is obviously a little slow, but given that it wouldn't fit in memory (at about 65MB), this was the only way it would work. When run, the stub would try to execute the method out of paging space. If a given function or symbol had not been paged in, the linker would use an LRU ring to evict an eligible C++ method in memory and load the new one from the debug symbol table we compiled in, cache it, and run that.

Last was widget and font code. Harfbuzz, surprisingly, compiled without incident and generated most of the chrome and screen. I devised a custom MDEF for the menus to get them closer together so that we would make the most of our limited screen real estate, as well as icons for the bookmarks menu. Windows were just regular windows, and for this I imported much of Classilla's widget code, which aside from stripping out the Unicode stuff worked surprisingly well in System 6. (If you have QuickDraw acceleration, it's a lot faster, and it can use the RAM on cards like the 8*24*GC for GWorlds.) As far as the actual XUL browser, I just wrote a very minimal tab implementation for right now. The tabs are merely bitmap PICTs in the skin folder, as is the back button and basic layout of the bar.

Well, that's enough about implementation; let's show you how it works. TenFourFox right now comes on a 128MB HFS disk image. I tested it originally with Mini vMac, which needed a little bit of hacking to enable MacTCP and an emulated SONIC Ethernet card (Paul, I will send you these sources). Here we are at the desktop. All of these screenshots are at 512x342 so that we can test it even on a compact Mac.

We start the build and the loader immediately switches to the dynamic linker, which enumerates libxul. For ease of use, all of the other libraries were rolled into XUL as well, unlike in TenFourFox where some are still in external dylibs (this meant only one big library had to be scanned and managed). In a future version it will cache runtime information and offsets, but this is still undergoing a lot of debugging, so caching this information right now is not too useful.

The first time I got it to run (which was a delight), I expected that it would be hard to get everything to fit on the screen, but this was ridiculous:

There are three things wrong here (but one turned out serendipitously to be right). The first is that the buttons and text fields somehow picked up outlines. However, this works really well on a 1-bit screen, so I kept the side effect (drawing over native controls which actually contain the hit areas). The second is that the text on the shaded buttons is tough to read. But the worst is that the entire screen -- because remember it came from TenFourFox -- is rendering at 96dpi on a 72dpi display, so it's about 133% too large.

So I hacked layout to assume that the screen DPI was 72. This was a little harder than I thought it would be and had some glitches (especially because some image resizing was sometimes necessary), but I was able to get that to basically work. I also changed the font renderer to prefer either white or black depending on the background it was against, and then knock out pixels of the shading around it so that it remained legible. This was the result (against yesterday's Cesar Chavez Google doodle):

Much better!

Image rendering is done using Floyd-Steinberg dithering to give the nicest fidelity, but solid shades drawn by the browser use a pattern diffusion since text rendering and knockout are more readable that way. The browser also computes a colour cube for background shades to further assist legibility. If there is no white background anywhere on the page, then the browser draws the lightest shade as white, and adjusts the others accordingly: after all, since there's no way the browser can faithfully render colours, why not have it just make good solid legible colour choices? You can see a nice example of this on Low End Mac below. The logo is using error diffusion, but the option bar is using pattern diffusion, and the font on top was rendered in white with knockouts to make it readable. (By the way, WOFF does work. It's downloaded to disk and cached.)

This adaptive layout strategy does not come cheap. On Mini vMac on the quad G5, with no speed control (balls to the wall), it took about 90 seconds to bring Google up, and about a minute for LEM. On my test SE/30, it took about three minutes for Google. I need to figure out some way of improving that within the unavoidable overhead of the linker system.

As far as the JIT goes, I am proud to say that on my test SE/30 SunSpider ran not only for probably the first time ever on a 68K Mac, but also completed it in under three hours! Excellent! However, SunSpider doesn't seem to be able to handle such long timings all that well, as you can see:

And then there's Facebook:

Well, there's still some work to be done, obviously. I think Zuckerberg should work harder on making Facebook accessible to all users, including 68K Macs.

I'll hopefully have test builds for 68K TenFourFox available shortly thereafter, along with a source dump and full build instructions. It's a great day for browsing on your vintage Mac! And as far as 20 on the PowerPC, look for it later this week.