Friday, April 15, 2011

4.0.1pre is now available

So, here's 4.0.1pre, and here's what's in it:

- The revised G5 JavaScript nanojit. Which is, you know, faster. And therefore doesn't suck. (G5 only)

- The AltiVec accelerated WebM video decoder (VP8, to be more precise). (7400/7450/G5)

- The scroll-slower-upper-thinger to make Flash applets not quite artifact as badly. They will still smear with scrolling, but less noticeably. (All)

- All the Firefox 4.0.1 fixes to date. There are quite a few; some don't affect us, but there are some security-related issues in here (none public) and some crash fixes. (All)

Plus this bonus hotness I snuck in:

- Tuned movie playback for all platforms, which affects not just WebM but also Theora. This causes slower Macs to buffer more decoded frames and smooth out the audio even if the video is choppy. The settings are different for G5 than G4/G3 because G5 has much bigger bandwidth due to its wider front-side bus. (All)

- Enabled AltiVec compositing also for libpixman. This means that pixel compositing is now done with AltiVec operations, which affects just about everything in the browser, really. The effect is subtle, but smooths out animations nicely and improves repainting speed. (7400/7450/G5)

Before you download this, consider grabbing a video and watching it in 4.0s just to see your starting point. I recommend going to www.youtube.com/html5 and signing up for the WebM trial (you don't need to have an account for that, just join with the link at the bottom), then viewing this video about Google's fiber experiment. If that video comes up in Flash, you did it wrong. This is a good test video because it has scenes with high data rates (the product manager on camera) alternating with lower ones (the animation sequences in the middle, and the title card and end card), so it's a good general overview.

On my G4/450 Sawtooth (7400) -- and by the way, I still don't support systems slower than 1.25GHz for video -- this video is pretty much unplayable with 4.0s. The audio is immensely fractured and you can forget about any video frames.

On my iBook G4/1.33 (7450) with 4.0s, the video and audio are extremely choppy with Energy Saver set to Automatic. Some frames appear and the audio is at least intelligible, but still fractured. It improves marginally at Highest.

On my quad G5 in Reduced with 4.0s, it plays in fits and starts. The snippets it does play are normal, but its data pipeline poops out and it has to buffer again repeatedly between them. On my quad G5 in Highest, it plays with only rare audio artifacts. I don't have a G3 handy right now (my PDQ is in pieces on my workbench waiting for a new hard disk).

So, now install 4.0.1pre and play it again. On the Sawtooth G4, the audio is artifacted, but now intelligible. The video is a glorified slideshow, but frames do appear at least occasionally. Hey, WebM video is pretty hefty to decode, and this computer is eleven years old, so whaddya want? :P

On my iBook G4 in Automatic with 4.0.1pre, it skips frames frequently, but the audio is nearly intact. In Highest, it still skips, but fewer, so I'd consider this playable.

On my quad G5 in Reduced with 4.0.1pre, it now plays perfectly.

Please note that Mozilla's streaming code is not terrifically robust. For example, if you enlarge or contract the Google video gadget using the arrows icon, then you start getting video artifacts when something overlays the video, and the buffering starts to seize up. The code doesn't seem to be able to handle a sudden drop in throughput well (a similar effect occurs when you change processor speed midstream in System Preferences). You can fix this by rewinding back to the beginning of the video, and then the streamer will be able to properly buffer. It looks like Mozilla's code just doesn't know what to do with the CPU changing performance characteristics abruptly or the video being dynamically resized as the video plays, but in fairness this is probably a lot to ask of it.

By the way, only my quad G5 in Highest could handle enlarged video fully (in Reduced, it got choppy), and don't even think about playing it full screen, because we can't hardware blit (this poor G5 got absolutely crushed trying to play Big Buck Bunny at 1920x1080 in software). This will only get better and still has some getting better to do, but it's a start, and the combination of buffer adjustments and VP8 VMX decoding does help.

Note that the WebM container decoding is still in C, and the actual portion of Mozilla's code that does the conversion to pixels (before they are blitted, which is VMX-accelerated) is also "just" in C. Firefox 5 is adding SIMD-based decoding of JPEG images; in at least older versions of libogg there was some AltiVec support; and I'd like to at some point have an entire AltiVec-accelerated content chain, all the way from raw data to rendering. There is some code that I think may be trivially converted to AltiVec (either in C or assembly) to gain us even more speed on the non-optimized sections, but that's for a later time.

Here is what I need from you wonderful beta hounds:

- G3 owners: Expect scrolling to be slower when a Flash applet is on-screen, but is it too slow? WebM video probably won't play hardly at all on your system, but it should not crash (which would mean AltiVec code snuck in). Is there any playback degradation with Ogg and VP3/Theora video using the new buffer settings?

- G4 owners: Improvement noticible? Above 1.25GHz, is video at least acceptable, even if it's a bit choppy or imperfect visually?

- G5 owners: Improvement noticible? How is JavaScript performance now?

Mozilla is planning an April 26th release for 4.0.1, and so will we. Anyway, go get:
You should get an upgrade notification when 4.0.1 final is available, so grab it and have at it, and post your observations in the comments.

32 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Using a dual 2.3 GHz PowerPC G5 1 GB ram, I downloaded the G5 beta and tried the video as explained above. Unfortunately, I don't see the improvement. The HTML 5 version played in fits and starts while the Flash version played smoothly both in small screen and full screen modes.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Hi Classic,

    Pbook 1.67 Ghz, 1.5 GB RAM, G4-7450.
    Your Google video is definitely playable (even at 720p, though yeah...) and waaaaay better than with 4.0.
    THANKS!

    ReplyDelete
  5. Tested WebM performance. Video speaks for itself:
    http://tinyurl.com/67obvle

    1.mov is Ten4Fox 4.0s (4b13 on the G3s),
    2.mov is Ten4Fox 4.0.1pre,
    3.mov is Safari 5 (Safari 4 on the G3s) playing the H.264 variant.

    Machines are
    - iBook G3, 800 MHz, 640 MB RAM with Mac OS X v10.4.11
    - PowerBook G3 (Pismo), 400 MHz, 768 MB RAM with Mac OS X v10.4.11
    - PowerBook G4 (Aluminium), 1333 MHz, 2048 MB RAM with Mac OS X v10.5.8
    Energy Saver is set to Highest on the G4 PB & the iBook (the Pismo doesn't have this setting).

    Some comments:

    G3 tests
    Buffer Tuning improves the situation very noticibly, audio is now pretty okay on 800 MHz, and okay even on 400 MHz; video is still a slide show. 4.0.1pre is better overall than Safari 4. Smooth Scrolling feels funny (and is pretty slow) when flash content is shown. Although, to be honest, I don't even dare to scroll (or touch a G3 machine at all) when a video is playing (and have Adblock installed), so this isn't a big deal.

    G4 tests
    Definite improvement for audio. Video now runs smoothly for a while, but then stalls completely for some time, then recovers. Safari is completely smooth, so you're probably right about the hardware acceleration.

    Sorry about the cats.

    ReplyDelete
  6. These are the results on my Powerbook G4 1.5GHz (Energy Saver to Highest):

    Webm: sound ok and video @ 5-10fps
    H.264(flash): sound ok and video @ 15-20fps

    ReplyDelete
  7. That doesn't surprise me, and even for a well-optimized VP8 platform such as x86, it still is slower than h.264:

    http://www.streaminglearningcenter.com/articles/webm-vs-h264-a-first-look.html

    So we should expect that. The main thing I'm looking for is to make sure it's better, and it sounds like so far it is.

    Simon, I'm a little distressed by your result -- this quad, even in Reduced (which should be comparable to your dual 2.3), plays WebM video beautifully now. Does the new JavaScript accelerator have issues on yours as well? This quad in Highest benches at around 1760ms, so I would expect somewhere in the low two thousands for yours. If it's significantly worse, we should look into that, because that implies the optimizations aren't working on a class of G5s. Other G5 owners, please put your comments in this thread.

    ReplyDelete
  8. I should say, 1760ms on SunSpider. It got in the low 90 runs/sec on Dromaeo.

    ReplyDelete
  9. Plays Flawlessly on my Dual 1.25GHz G4

    ReplyDelete
  10. Would it be possible to speed-up WebM decoding at the expense of quality? For example, VLC can play H.264 bypassing the deblocking, so the image quality is lower, but playback is smoother.

    ReplyDelete
  11. Definitely possible. The trick is how to make it configurable without totally gutting the media framework. What I'd probably do is just kill the loop filter, since that's the most CPU heavy portion according to my profiling. Naturally quality would suffer, but it would certainly be a lot faster. However, that's at the libvpx level and not the Mozilla level, so propagating prefs into libvpx is going to be "interesting." That won't be in 4.0.1. I might consider it for 4.0.2 or 5.

    ReplyDelete
  12. Or, the other option is just to take the loop filter out entirely of the G4 builds, and those ones just don't do deblocking or filtering. I'd like to hear if anyone objects to that idea. After all, our G4s aren't getting any faster. -_-

    ReplyDelete
  13. Triple post'd! PoLiYa's idea intrigues me and I did some digging in the VP8 SDK. It looks like you *can* hint the codec to disable deblocking, and I've found the code in content/media that initializes the codec. So that's where we can sneak a pref in. I might be able to get this ready for 4.0.1. By default, I'm going to have deblocking OFF for G3 and G4/7400. For 7450 and G5, I'll have it default to ON, but you can turn it off. Sounds good? No promises for 4.0.1 but this is definitely doable and the more I look at this idea the more I like it.

    ReplyDelete
  14. Seriously cannot thank you enough for all your hard work on this, Cameron.

    On my ibook G4 1.25ghz playback was most definitely improved, though still "choppy". Audio was perfect. Screen tear of plugin also noticeably improved.

    In case you or anyone who reads your development blog cares there are two free stand alone applications out there that can search and play youtube with spectacular results on this old G4. One is Mactubes,it substitutes quicktime for Flash, at fmt=34 (somewhere between 480 and 720p I would say) the same video plays FLAWLESSLY. 720p will still choke it up. The other one is Youview, it substitutes FFMPEG for flash, and they play well at 480p.

    Looking forward to seeing what improvement there is with the deblocker disabled.

    ReplyDelete
  15. Bad news, the deblocker is *already* off. I couldn't tell this from the code initially because of the way Mozilla built libvpx, so it took me a little while to figure out why I got no win disabling it. Great idea, though.

    Next thought is to kill the loop filter, but this requires me gutting libvpx (fortunately it's still 0.9.5 in Fx5, so that won't hurt us in the short run) and thus won't appear for 4.0.1. Also, I don't know what effect that will have on quality, so that's too big a jump for here.

    ReplyDelete
  16. Hi! Give´s TFF in german language? TFF works great on my MDD!

    ReplyDelete
  17. More than killing the loopfilter, Altivec YUV->RGB will help. On a number of videos I've tested at 360p on a 1GHz G4, ~10% of my cpu time is burned in libvpx, nearly ~25% in YUV->RGB (both the Linear (LinearScaleYUVToRGV32Row) and Fast conversion (FastConvertYUVToRGB32Row)).

    ReplyDelete
  18. I agree we should do that too, but it's more a question of what's more feasible with the least amount of work (I'm lazy ;).

    ReplyDelete
  19. Wow! The difference in overall/JavaScript performance between 4.0s and this build is incredible on my Quad G5. You are a life-saver!

    Do you know if there's any big obstacle for re-using your JIT improvements in the Linux/PPC versions of Firefox/IceWeasel?

    ReplyDelete
  20. Thanks, Erik! The trick will be whatever modifications they have to make to the Linux build system to enable it (the code should compile just fine). I can't advise on that since I don't run Linux personally. I need to respin this for bug 624164 with the G5 fixes.

    ReplyDelete
  21. The Google fiber experiment video worked for me in 360p. In default and expanded view, everything was fine. Full screen played OK, but had an annoying flickering black bar just above the video controls (and if I moved the cursor around, parts of it would go away, while other times black rectangles would appear where my cursor was moving).

    720p worked only in default view. Expanded and full-screen views were choppy at best and non-functioning (black screen) at worst. I'm assuming that goes along w/ that whole issue you mentioned in your original post about not being able to hardware blit (much of this is admittedly wayyyy over my head ;) ).

    ReplyDelete
  22. Oh, forgot to mention:

    Dual 2.5 GHz G5.

    Thanks, by the way, for all your work on this browser. I think it's turning out nicely!

    ReplyDelete
  23. Thanks, Brian! I'm sure you noticed final is out too.

    Your experience was pretty much the same as mine. I think it's worth noting that Perian 1.2.2, which can now play WebM and God bless him still works on PPC Tiger, does use QuickTime and thus has hardware acceleration in the render chain. Full screen playback is pretty much perfect on my quad, and even HD WebM plays relatively well. So at some point we need to figure out what we can do to make the display faster - we can't use Mozilla's code for this because of our OpenGL limitations, so other ideas will have to be applied. At the very least, though, there's an out of browser way to get this content now.

    ReplyDelete
  24. Hi,

    since you continue mentioning the missing OpenGL 2 support in PPC Tiger; Simply copying over the OpenGL and GLUT frameworks (/System/Library/Frameworks) from Intel Tiger (i.e. 10.4.11 Intel Combo Updater) should provide you with OpenGL 2 support. I tried this once (IIRC also with TenFourFox) and it did. Might be you'd have to use the graphics card/chip drivers from Intel Tiger as well.
    Actually I'm using Intel Tiger on my PPC machines (PowerBook Wallstreet and 12" G4). Well, in order to get everything working correctly on those machines there are some modifications needed but on a PowerMac G5 it might as well work out of the box.

    Thanks for all your work on TenFourFox!

    ReplyDelete
  25. Tobias, thanks for the kind words, and no offense, but I am extraordinarily sceptical. I'd need serious proof that those frameworks and drivers were indeed universal before I try that myself, let alone recommend it to other users. Not even PPC Leopard supports OpenGL 2 on all cards. If you can demonstrate that they are indeed universal (not just "appears to work"), I'll look into it, but frankly this sounds fishy. Again, no offense intended.

    ReplyDelete
  26. Why be that esceptical?

    I'll do a fresh install of PPC Tiger again and will copy over the OpenGL and GLUT frameworks from Intel Tiger.
    After that I'll test WebGL in TenFourFox and report the results.

    ReplyDelete
  27. Actually, that won't work -- WebGL in TenFourFox is unconditionally disabled. It was just simpler rather than risking untested code. You might want to try something like Xbench or a standalone OpenGL test suite to make sure it's working *and* that the OpenGL version is indeed 2. Here's why I'm sceptical:

    /System/Library/Frameworks/OpenGL.framework/Versions/Current/Libraries/% file *
    libGL.dylib: Mach-O dynamically linked shared library ppc
    libGLImage.dylib: Mach-O dynamically linked shared library ppc
    libGLProgrammability.dylib: Mach-O dynamically linked shared library ppc
    libGLSystem.dylib: Mach-O dynamically linked shared library ppc
    libGLU.dylib: Mach-O dynamically linked shared library ppc

    Notice that none of these are Universal on my quad G5 (10.4.11). I would be extremely surprised to find that the Intel versions *were* Universal -- that would be pure deadweight. People have lamented the sorry state of Apple OpenGL for years, so if this were all that were necessary someone should have discovered this by now. Extraordinary claims require extraordinary proof :)

    Again, please don't take any offense. I'd love it to be true, but I strongly suspect it is actually not.

    ReplyDelete
  28. Actually, here's a better idea than Xbench: I assume you have Xcode 2.5 installed (if not, do so; it's free). Run /Developer/Applications/Graphics Tools/OpenGL Driver Monitor.app. Go to Monitors > Renderer Info. If your hack works, the Apple Software Renderer should be version 2.0 (as should, I imagine, the rest of your renderers). On my quad it is the (expected) 1.2.1.

    ReplyDelete
  29. Well, Intel Tiger is a universal build because of Rosetta. And Apple did also ship a universal version of Intel Tiger Server officially. PowerPC Tiger isn't universal and also the build versions for the two architectures differ (except for the universal Server build). Intel Tiger is in some ways an intermediate version between Tiger and Leopard.
    I actually use the Intel version of Tiger 10.4.11 on my PowerBook Wallstreet as my only OS. I needed to modify some kernel extensions and a few other things in order to get it working completely. I needed the Intel version of Tiger because I wanted support for my Broadcom BCM4321 based WiFi card - and it's supported in Intel Tiger only, and I also get 802.11n support that way.

    Here the results of my test:
    I used OpenGL Extension Viewer to test the OpenGL 2 compliance.
    With the OpenGL and GLUT frameworks from 10.4.11 Intel Tiger the OpenGL 2.0 test worked with the Apple Software Renderer on 10.4.11 PowerPC Tiger and it was stated that the Apple Software Renderer was fully OpenGL 2.0 compliant.
    As I don't have a OpenGL 2.0 capable graphics chip in neither of my machines (the best is a GeForce FX Go 5200) I can't tell for hardware accelerated renderers. But I tried to use the GeForce FX OpenGL driver bundle from Intel Tiger and it worked very badly, the display being completely garbled.

    I also tested to force-enable WebGL and layers acceleration with stock PPC Tiger frameworks:
    When I force-enabled WebGL in TenFourFox 4.0.1 I actually could run two of the simplest WebGL demos from khronos.org . In Leopard 10.5.8 a good number of them is working, with a few glitches though.
    I could also force-enable layers acceleration but all the accelerated windows were solid black - but they were listed as being accelerated in about:support . That behaviour was the same in 10.4.11 as in 10.5.8 and it should be because of lacking features of the GeForce FX GPU.

    ReplyDelete
  30. I don't have the Intel version of 10.4 to test all of this. So I can't tell if this is an april fool's joke or the greatest discovery in ages. If this worked, the point is, what do we make of it? We can't expect the people who use 10.4 to replace frameworks in the system library (thus changing their OS to an unsupported, maybe instable state) to make one feature of one application work a little faster. So this might be great for a few people who know what they're doing and are willing to experiment, but completely irrelevant for people who just want to continue using Firefox on their Macs. Or is there a way to bundle these frameworks within the application package?

    On the other hand, my PowerBook's ATI Mobility Radeon 9700 running 10.5.8 performes gloriously up to OpenGL 2.1 in OpenGL Extensions Viewer (the difference to the software renderer is, ahm, there are no words…), so, yes, I'd love to go that way.

    ReplyDelete
  31. You could do like I did and download the Intel Tiger 10.4.11 Combo Updater (http://support.apple.com/downloads/Mac_OS_X_10_4_11_Combo_Update__Intel_) and extract the frameworks using Pacifist (http://www.charlessoft.com).

    But don't expect too much - although the OpenGL 2.0 tests works with the Software Renderer using the Intel Tiger frameworks all attempts to run any WebGL demos in TenFourFox resulted in an immediate crash related to libGLProgrammability. That also happens in the pure Intel Tiger installation that I use with my PowerBooks so it doesn't seem to be related to using the frameworks from Intel Tiger in PPC Tiger. Might be TenFourFox would have to be built with and linked against those frameworks as well.
    Also note that in order to be able to use hardware accelerated OpenGL 2.0 rendering you'd have to use the OpenGL driver bundles (they live in /System/Library/Extensions) from Intel Tiger. On my machine (GeForce FX 5200) that didn't work. That most surely wouldn't work unless you use the corresponding graphics driver kexts from Intel Tiger as well. But that requires using the kernel and some of the most fundamental kexts from Intel Tiger as well. On my PowerBook Wallstreet I used such a setup for several months.

    If it were just the frameworks I think they could be bundled with TenFourFox but I'm sure it's impossible provide modified Apple copyrighted closed source frameworks bundled with and application for public download.

    ReplyDelete
  32. I'm really impressed about the huge improvement beetween versions. I've tried today with 4.0.1 final and the results with HTML5 and WebM at Youtube were great! You re-survived my PowerBook G4, thanks!

    ReplyDelete

Due to an increased frequency of spam, comments are now subject to moderation.