Tuesday, April 12, 2011

I am the world's biggest liar

Dear readers, I must confess to you all what an amazingly brazen and horrible liar I have been. For days, nay, weeks, you have laboured under the completely fraudulent impression that AltiVec-accelerated WebM video was coming to 4.0.2pre, and my conscience, ravaged by guilt and dismay, cannot abide to persist in my duplicity any longer. I must therefore reveal to you the awful, shameful truth ...

... that AltiVec WebM is going to be in 4.0.1pre! YAY!!!

Yes, it actually works! With a little preprocessing of the included AltiVec sources from Google libvpx, some adjustments by hand and a lot of glue code in the build system, we are now building a mostly (not fully, I'll explain in a second) VMX/AltiVec-accelerated VP8 codec, just like the SSE2 and NEON-juiced VP8 codecs for x86 and ARM! It'll be in the 7400, 7450 and G5 releases.

This version now makes all but the highest data rate videos at least playable (and some completely playable) on G5 and high-end G4 machines, and makes it possible for video to play at all on low-end G4s (but please note that the recommended 1.25GHz clock speed remains). As my standard, I used a clip from Big Buck Bunny which would only play fully on my quad G5 if I turned Energy Saver to Highest (I usually run in Reduced to save power and increase the life of the machine), otherwise the data pipeline would run dry and it would seize up repeatedly. With the new AltiVec VP8, it runs all the way through. No stutters, no hiccups. YouTube HTML5 performed splendidly. It's a beautiful thing.

Oh, but that's not all that'll be in 4.0.1pre. Besides the G5-enabled JavaScript acceleration (down to 1760ms in SunSpider!) I talked about in our last post, I've also found a kludge to reduce Flash screen artifacts when scrolling. I'm sure the careful ones out there have noticed that scrollllling verrrrry sloooooowllly will keep the artifacting down to a minimum, and there's a way you can do this already: enable Smooth Scrolling under Preferences, Advanced, General. However, smooth scrolling is definitely slower and it stinks to do it all the time if you're not used to it. So, why not enable smooth scrolling when a plugin is onscreen, and then revert to the user's preference otherwise? Why not indeed! And, while there is still some artifacting, it is much, much less. Of course, if you use HTML5 video like our new AltiVec WebM, you won't need Flash. I'm just saying.

This is not an unqualified success, however. I said the WebM code is mostly AltiVec accelerated, and it is. However, we do not have assembly source for most of the inverse discrete cosine transform algorithms, just for one of them. I looked at the C version for the other inverse DCTs and it looks pretty obvious to vectorize, but this is going off into completely new work territory and I think I'd rather not do that for a stable branch even though this is getting beta coverage (read on). Fortunately, the especially computationally intensive parts such as the filtering are fully written in assembly and we do have those. Also, this means we have points to improve on in the future, so performance should only get better.

Also, just because the decoder is AltiVec-enabled doesn't mean the compositor is, and I've alluded before that the graphics stack in Firefox 4 is slower than Firefox 3.6 on non-accelerated systems (and all TenFourFox builds are non-accelerated because PPC Tiger lacks OpenGL 2). If you try to play a WebM video expanded or full screen, then you're also testing how well we blit to the screen and scale the image, and we already know that's a bottleneck. At least for now, full screen video will still be the domain of Flash Player. (I was hoping Cairo 1.10 would land in Firefox 5, but it looks like it won't make the cutoff after all.)

Do note that the reason I want to put all this hotness into a quickie beta release is because this is a lot of new and relatively untested code. G3 owners are particularly important because 1) I don't want AltiVec code leaking into your builds and crashing you and 2) I want to make sure that the plugin scrolling hack doesn't make your machines in particular too slow. (Flash itself might, ha ha, but we shouldn't.) Similarly, I want to make sure that the AltiVec acceleration on G4/G5 is as good on as wide a range of systems as I think it is, and ditto for JavaScript on the G5.

This is all very convenient because Mozilla is planning to release 4.0.1, which they have named "Macaw" (presumably after watching Rio trailers and playing a lot of Angry Birds), on March 26th. With luck, there should be nothing landing on the 2.0 release branch tomorrow, so I can pull down the changes, spin off a few builds, and hopefully have betas out to your lucky devils likely by this Thursday or Friday. There are a lot of fixes in 4.0.1, mostly for crashes and a couple for some possible security issues. The plan is to release our own betas and assuming they pass muster, we release the same day as the regular 4.0.1 to the general audience. In the meantime, I'll have more to say about Firefox 5 in a future post.

Anyway, will you forgive me for my lies and heartbreak? I knew you would.

10 comments:

  1. WooHoo! Great News. Keep up the good work!

    ReplyDelete
  2. Looking forward to testing WebM performance.
    Keep on lying!

    ReplyDelete
  3. Cool! ;

    If you're interested in Altivec support for YUV->RGB conversion, the OggPlay code in Firefox 3.6 had (disabled) Altivec YUV->RGB code. Perhaps it'd be straightforward to adapt to the Chromium-based code in FF4.

    Also, could you share a bit about how to build a TenFourFox for 10.5 (:)) with an accelerated graphics stack, whatever that means?

    Thanks,
    -- vs

    ReplyDelete
  4. Venkatesh, yes, I looked at some other places that VMX/AltiVec could be pref'ed on (and in fact there is one more that I'll announce with the beta, which if I can get it certified will probably be out tomorrow evening PST). The problem with the liboggplay code in the Mozilla 1.9.2 tree is that it actually doesn't accelerate anything even if AltiVec mode is requested; it's still using the vanilla converters (see oggplay_yuv2rgb.c and look for ENABLE_ALTIVEC). And of course liboggplay is gone from Mozilla 2.0. So I'm still investigating AltiVec VP3, but I'm sticking with the wins I know I can get first. Eventually I'd like to have as much of the content decoding AltiVec-accelerated as possible, but that's another story for another day.

    W/r/t building an accelerated graphics stack for TenFourFox in 10.5, I can't advise you there because I don't run 10.5 -- that's another reason why TenFourFox is 10.4, because I need it myself. In fact, I don't own a PowerPC that runs 10.5 at all. If someone manages to get this working, I will be happy to accept their changeset as long as it doesn't interfere with 10.4. I won't be able to run off such builds myself, however, so that would always remain a custom option.

    ReplyDelete
  5. Okay; wrt VP3 (Theora) -- libtheora 1.2 (not-yet-released) clocks in at ~10% faster than 1.1 on my Powerbook, so its perhaps a good step if you can move to it.

    For Altivec support for VP3, when run standalone (actually via the Xiph QT components :)), Theora (1.2 as of last summer) spends ~30% of its CPU time in oc_frag_recon_inter2 and oc_frag_recon_inter, which would be fairly trivial to vectorize; (these two checked in at #2 and #3, after the loop filter on a neat stop motion plant video).

    Also, the original VP3 code has some (fairly messy) code you can start from: http://svn.xiph.org/trunk/vp32/CoreLibs/CDXV/Vp31/Common/mac/OptFunctionsPPC.c ; no idea on how good they are.

    ReplyDelete
  6. 1.2 is in fact in Mozilla 2.0 (see media/libtheora/README). Yes, I agree that oc_frag_recon_inter2?_c are pretty straightforward vector conversions. I'll look at this when I get a free chance (possibly for 4.0.2pre), but I always accept patches, hint hint :)

    ReplyDelete
  7. yeah!! Great news again, thanks a lot!!

    Anyway what about IPC support on PPC too?
    here there is a first patch from an ArchlinuxPPC developer:
    http://bugs.gentoo.org/show_bug.cgi?id=325185

    ReplyDelete
  8. IPC for TenFourFox is a different problem than Linux -- Linux lacks the intrinsics for PPC, but can compile everything else (the patch you reference indeed has the missing intrinsics). TenFourFox is on Mac OS X, so we have the intrinsics, but there are other 10.5-specific library functions that have to be rewritten for 10.4. I've done this before, but it's never been tested.

    I finished the compiles yesterday, but video on my iBook G4/1.33 is still choppier than I would like, so I'm doing some rebuilds with adjusted buffering. It plays now on the 1.33, but skips a lot of frames. The G5 is just fine, of course.

    ReplyDelete
  9. Just in case you hadn't run into this yet: there's example code by Apple for Altivec-accelerated DCTs at http://developer.apple.com/hardwaredrivers/ve/examples.html

    ReplyDelete
  10. Great !!
    Hence the power of opensource software,;)
    We just need bright minds!

    ReplyDelete

Due to an increased frequency of spam, comments are now subject to moderation.