Thursday, 4 November 2010

Display list vs. blitting - the results!



To get some actual evidence for my opinions on the joys of the Flash display list, I created two demos that I'm calling "BunnyMark", a test of rendering small bouncing bunny pngs with alpha transparency. Since first posting, lots of readers helped by testing on different browsers and operating systems, and I have updated this post with their results.

The results were quite interesting, and not quite what I expected. Blitting was really fast, although actually a little bit slower than I expected, but gave a consistent rendering speed across all platforms. Bitmaps were also pretty fast, although in Safari on Mac performed really badly. I emailed Tinic from the Flash player team about this issue, and he has said he will look into it. Ok so here are the results:
  • The display list demo could render 4000 bunnies at 30 fps on my PC without slowing down. This was replicated by readers on all Mac and Windows browsers except for Mac Safari, where it was down to 10-20 fps. Based on this interesting blog post from Tinic Uro (suggested by Richard Leggett), this seems like it may be something to do with the recent adoption of the Core Animation APIs in Safari. The demos has a lot more layers stacked up than you would need for most games, so this performance drop is unlikely to affect a real game - although I will be following up with a new benchmark to test that hypothesis. Bitmaps faired very badly on Android - it couldn't even render 10 bunnies at 30fps.
  • The blitting demo could render 6000 bunnies at 30 fps without slowing down on my PC, and people with faster machines have reported up to 11000 bunnies at 30 fps. Blitting was also much more effective on Android, where it got up to 600 bunnies at 30 fps, certainly enough performance for an arcade-style game. (Thanks to Philippe Elsass for the Android tests).
So in this example, blitting is about twice as fast. But as I hope you can see, realistically 3000 bunnies is still a lot more than you are going to need in most situations. You can download the source code and see if you can improve the performance of either demo. A couple of readers have recommended performance optimisations, for example suggesting I use a fixed-length vector and using lock() and unlock() on my bitmapData, but neither strategy noticeably improved performance on my machine.

I also wondered whether switching the wmode in the HTML can fix the Safari issues - it doesn't. If you want to try them: Opaque, Transparent, Direct, GPU (both Direct and GPU give 5 fps in Chrome on Windows!). This post from way back in 2008 may possibly shed some light on this topic:
"Just because the Flash Player is using the video card for rendering does not mean it will be faster. In the majority of cases your content will become slower." - Tinic Uro
Just a final note - I ran a similar test to this 2 years ago in Microsoft XNA and was able to get something like 50,000 bunnies going at HD resolution, and 60 fps. I think molehill is going to make this discussion somewhat irrelevant next year - GPU blitting will annihilate both of these approaches. The question will then be, can the display list also be speeded up by the GPU, or is it just too wacky and different to what graphics cards are designed to handle?

31 comments:

Lawrie said...

This and the previous post are really interesting - great to see some benchmarks too.

I'm still a bit confused as to how much Molehill will help with non-3d development though. Thibault said -

"Molehill is not a 2D GPU renderer, however, things like particles engine for instance can be developed through shaders on the Molehill engine and produce awesome results in terms of speed and amount of particles you wanna push on screen, particles on the GPU"

Kyle said...

Nice work. I really doubt that the majority of games are graphically complex enough that it matters either way, and the display list gives you that much more functionality built right in there.

I've sometimes run into some weird tearing effects when using bitmaps within the display list though. Like, if a bunch of bitmaps are moving across the screen quickly sometimes it will cut off part of the bitmap for a frame - just enough to be noticeable. I assume it's something to do with the Flash Player getting its redraw regions slightly wrong. Not a big deal, really, just something I've occasionally noticed.

Jon Howard said...

Excellent stuff.

So is 'bunny count' going to be the standard unit for blit/sprite comparative benchmarking?

Kyle said...

Another advantage blitting might have is if you want to do some post-processing with the canvas bitmap, for whatever reason. You could plop the canvas into PixelBender, for instance, and do something like what Mario Klingemann was showing off at FOTB this year.

Mark Barcinski said...

Hi, the display list runs at 0-6 fps on my mac (2.4 Ghz , os 10.6 , fp 10.1.85.3 ) the blitting demo at 30 fps. That makes a huge difference on my machine.

Cheers,
M

Hugo Larcher said...

Since you asked on your tweet, here's my results for both running on my iMac 3.06Ghz Core 2 Duo with 4 GB Ram

DisplayList - 30fps stable (55-60 Mem)
Blitting - 30fps stable (55-60 Mem)

So nothing weird here, results match your post.

Hope this helps.

Kyle said...

FWIW I got 30fps on both on Windows in Chrome.

Could be the debug player causing issues on the Mac?

SinisterDex said...

Suprised by the high memory usage under mac. I run win 7 64-bit with the latest 10.1 32-bit release player in firefox.

DisplayList - 30fps stable (23MB Mem)
Blitting - 30fps stable (22MB Mem)

Matt Perrin said...

I did benchmarks & a presentation for a local game dev community and found blitting to be incredibly powerful (5x FPS improvement rendering thousands of items, minimal memory allocation).

It's all about using the right tools & techniques for the job and what I found was that a hybrid blitting / Display Object approach would really allow me to push Flash to its limits without overly taxing a user's system. While it's a bit more programmer dependent and requires structure & forethought, every AS3 developer should learn blitting and keep that knowledge handy for when they need it.

Iain said...

@matt - We need to see the source of your benchmarks, or you need to download my source and create the same improvement in my benchmarks. This is science and we need to show out evidence or it is all superstition!

Hugo Larcher said...

I got new findings, I was curious about the low fps on mac since my results were stable 30fps by I was testing on FF so I opened it up on safari and here's the numbers:

DisplayList - 10fps (7Mb mem)
Blitting - 30fps (5.6Mb mem)

So that's it, seems like an issue for the safari plugin on Mac. Flash usually runs better on Safari then FF so these results are quite a surprise and VERY insightful.

Hugo Larcher said...

I only see a tiny gain with the window mode, i get 11/14 fps on safari for the displaylist test now.

wonderwhy-er said...

Was interesting to see classic Flash programmer attack on newcomers approach with C++ background :)

In beginning of this year because of same reason I was doing similar tests. God some similar results then.

In you tests I get 30FPS in both Win7 CPU Intel Quad 2.83Ghz

And here are my experiments links
http://wonderwhy-er.deviantart.com/art/Flash-Rendering-Tests-149580102
http://wonderwhy-er.deviantart.com/art/Flash-Rendering-Tests-2-149679063

Then I got some mixed results so comment for first test is wrong and I did not fix it still :) With my tests now I get 9FPS with display list with 50k circles and 15 FPS with blitting with 50k circles.
You should add some way to set amount of bunnies to your example so that we could tune it to the power of our PCs.

BTW back then I noticed one strange thing. On my old AMD powered 2Ghz notebook it worked a lot worse then on Intel machines which made me get weird results at first where bitmaps in displaylist were on par with blitting...

BTW there is one more feature you get with displaylist you do not with blitting and it is player redraw rect being automatically and efficiently calculated by Flash player while it is not possible to do with one big backbuffer bliting approach.

davidburrows said...

On Safari, standard player, I get super super slow animation initially on display list, then as the bunnies disperse across the rest of the stage it speeds up (but not to 30fps)

It would imply that more occluding bunnies = lower performance

MC said...

Excuse me,... what if paste the png image into a MovieClip and use it as a regular movieclip.... will it be much slower?...

Iain said...

@MC I intend to post another BunnyMark with sprite, movieclip, shape etc in the near future.

Frank said...

Android:

Displayobject is killed by the browser. It isn't responding fast enough.

Blitting : 7fps

On htc desire

S├ębastien said...

I managed to gain about 2 - 3 FPS on the BitmapTest.

I set up the file to 60 FPS to give the player the possibility to render faster than 30 FPS.

On my computer, it runs at 32 FPS with 6000 bunnies. Before I optimized it I had 29 FPS with 6000 bunnies.

Not a HUGE improvement, but still...

Source : http://www.popcodestudio.com/dropbox/BitmapTest.as

Philippe said...

My tests under Android (HTC Desire):
- you can draw 600 bunnies at 30fps,
- but you can only count on moving 150 bitmaps at 20fps: it doesn't seem possible to do 30fps even moving only 10 bunnies.

So definitely blitting for mobile but I'll go with the displaylist when I can.

jpauclair.net said...

You are comparing using a DisplayList of bitmaps vs using copyPixel in one BitmapData.

Try the same thing with a sprite vector graphics, with scale and rotation vs blitting of same thing.

Damian said...

To make the blit test faster:
* make the vector a fixed vector
* use lock() and unlock() before and after blitting to the bitmapData
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/display/BitmapData.html#lock()

I could get 11000 at 30fps running in debug mode. Around 14000 in release

Damian said...

Also, in your code you're actually creating 6001 bunnies ;)

Iain said...

Thanks Damian - I was sure blitting could go faster than I had it. Please send me your code and I will add it to the download / examples.

Damian said...

I just sent you an email, but it's going to take you about 3 seconds to make the changes :D

LSaridina said...

Okay, I got it.. good bunny..
Well, blitting definitely will win on this bunny olympiad.

fr1n63 said...

MacBook Air, 11" maxed out, the display list manages 1 fps in Safari, whilst the blitting hits a steady 25fps, so I'm guessing as the computer gets slower, the speed hit gets more pronounced.

fr1n63 said...

Quoting "Try the same thing with a sprite vector graphics, with scale and rotation vs blitting of same thing."

That's a different issue, if you're aiming for blitting you really need to think about the application you're using it for, there are many methods to maintain speed. However if scaling / rotating 3,000 objects is your target, you're most likely going to have problems, regardless of the method.

GundikHangLekiu said...

19-22 fps with the displaylist demo.

15-17 fps with the blitting demo.

Athlon64 3000+ (1.8Ghz)
1GB RAM
WinXP
Flash 10.1

I've reloaded both demos multiple times but the results remain the same.

Anonymous said...

I was using the blitting technique for my engine, but since the introduction of fp 10.1 Adobe added hardware support, so I think that now the displaylist approach would be as fast as the blit one. The demos run both at 30 fps in my pc.

Anonymous said...

I can't wait for gpu blitting. This is going to be the biggest performance increase in the history of the flash player.

Anonymous said...

are we supposed to count the bunnies ourselves? I could only count to 20 then it got a bit confusing.