Either my timing is great or it’s lousy – you decide. Yesterday I posted about parallelising Conway’s Life – and today the new CTP for the Parallel Extensions library comes out! The bad news is that it meant I had to run all the tests again… the good news is that it means we can see whether or not the team’s work over the last 6 months has paid off.
Breaking change
The only breaking change I’ve seen is that AsParallel()
no longer takes a ParallelQueryOptions
parameter – instead, you call AsOrdered()
on the value returned from AsParallel()
. It was an easy change to make, and worked first time. There may well be plenty of other breaking API changes which are more significant, of course – I’m hardly using any of it.
Benchmark comparisons
One of the nice things about having the previous blog entries is that I can easily compare the results of how things were then with how they are now. Here are the test results for the areas of the previous blog posts which used Parallel Extensions. For the Game of Life, I haven’t included the results with the rendering for the fast version, as they’re still bound by rendering (unsurprisingly).
Mandelbrot set
Results are in milliseconds taken to plot the whole image, so less is better. (x;y) values mean “Width=x, MaxIterations=y.”
Description | December CTP | June CTP |
---|---|---|
ParallelForLoop (1200;200) | 376 | 380 |
ParallelForLoop (3000;200) | 2361 | 2394 |
ParallelForLoop (1200;800) | 1292 | 1297 |
ParallelLinqRowByRowInPlace (1200;200) | 378 | 393 |
ParallelLinqRowByRowInPlace (3000;200) | 2347 | 2440 |
ParallelLinqRowByRowInPlace (1200;800) | 1295 | 1939 |
ParallelLinqRowByRowWithCopy (1200;200) | 382 | 411 |
ParallelLinqRowByRowWithCopy (3000;200) | 2376 | 2484 |
ParallelLinqRowByRowWithCopy (1200;800) | 1288 | 1401 |
ParallelLinqWithGenerator (1200;200) | 4782 | 4868 |
ParallelLinqWithGenerator (3000;200) | 29752 | 31366 |
ParallelLinqWithGenerator (1200;800) | 16626 | 16855 |
ParallelLinqWithSequenceOfPoints (1200;200) | 549 | 533 |
ParallelLinqWithSequenceOfPoints (3000;200) | 3413 | 3290 |
ParallelLinqWithSequenceOfPoints (1200;800) | 1462 | 1460 |
UnorderedParalleLinqInPlace (1200;200) | 422 | 440 |
UnorderedParalleLinqInPlace (3000;200) | 2586 | 2775 |
UnorderedParalleLinqInPlace (1200;800) | 1317 | 1475 |
UnorderedParallelLinqInPlaceWithDelegate (1200;200) | 509 | 514 |
UnorderedParallelLinqInPlaceWithDelegate (3000;200) | 3093 | 3134 |
UnorderedParallelLinqInPlaceWithDelegate (1200;800) | 1392 | 1571 |
UnorderedParallelLinqInPlaceWithGenerator (1200;200) | 5046 | 5511 |
UnorderedParallelLinqInPlaceWithGenerator (3000;200) | 31657 | 30258 |
UnorderedParallelLinqInPlaceWithGenerator (1200;800) | 17026 | 19517 |
UnorderedParallelLinqSimple (1200;200) | 556 | 595 |
UnorderedParallelLinqSimple (3000;200) | 3449 | 3700 |
UnorderedParallelLinqSimple (1200;800) | 1448 | 1506 |
UnorderedParalelLinqWithStruct (1200;200) | 511 | 534 |
UnorderedParalelLinqWithStruct (3000;200) | 3227 | 3154 |
UnorderedParalelLinqWithStruct (1200;800) | 1427 | 1445 |
A mixed bag, but overall it looks to me like the June CTP was slightly worse than the older one. Of course, that’s assuming that everything else on my computer is the same as it was a couple of weeks ago, etc. I’m not going to claim it’s a perfect benchmark by any means. Anyway, can it do any better with the Game of Life?
Game of Life
Results are in frames per second, so more is better.
Description | December CTP | June CTP |
---|---|---|
ParallelFastInteriorBoard (rendered) | 23 | 22 |
ParallelFastInteriorBoard (unrendered) | 29 | 28 |
ParallelFastInteriorBoard | 508 | 592 |
Yes, ParallelFastInteriorBoard really did get that much speed bump, apparently. I have no idea why… which leads me to the slightly disturbing conclusion of this post:
Conclusion
The numbers above don’t mean much. At least, they’re not particularly useful because I don’t understand them. Why would a few tests become “slightly significantly worse” and one particular test get markedly better? Do I have serious benchmarking issues in terms of my test rig? (I’m sure it could be a lot “cleaner” – I didn’t think it would make very much difference.)
I’ve always been aware that off-the-cuff benchmarking like this was of slightly dubious value, at least in terms of the details. The only facts which are really useful here are the big jumps due to algorithm etc. The rest is noise which may interest Joe and the team (and hey, if it proves to be useful, that’s great) but which is beyond my ability to reason about. Modern machines are complicated beasts, with complicated operating systems and complicated platforms above them.
So ignore the numbers. Maybe the new CTP is faster for your app, maybe it’s not. It is important to know it’s out there, and that if you’re interested in parallelisation but haven’t played with it yet, this is a good time to pick it up.
The way that I think about it is “hey, they just made a bunch of architectural changes, presumably for the better, and none of it totally screwed everything up!” Performance-tuning—at the level of resolution that would show the most of the differences above—should be the one of the last steps, I would think.
There’s also this bit:
LikeLike
Hey Jon, I’ve been enjoying your posts. Remember that, at this point, not a lot of performance optimizations have been done to the CTPs as the focus is really on the API and functionality. So, I’m not too surprised that there are some performance anomalies (increases) in there. Especially the fact that PLINQ switched its underlying implementation from ThreadPool-based to TPL-based.
I’m sure we’ll see performance within Parallel Extensions really start to soar as we get closer to release :).
Keep up the great work!
LikeLike
My first thought with respect to the performance changes: in addition to your other observations, I’ll also add that getting good performance from incorrect code is often a lot easier than getting good performance from correct code. (This has been a classic issue with graphics card benchmarking…do you count better frame rates if they come at the expense of accurate rendering?)
In other words, it’s entirely possible that in the course of fixing some bugs in the Parallel Extensions, certain things got slower. :)
LikeLike
PLINQ is wonderful. I am working with Parallel Extensions. I want to exploit multicore now!
Two books recommended for those interested in multicore development with C# 2008 and with future C# 2010:
Concurrent Programming on Windows
Author: Joe Duffy
http://www.amazon.com/Concurrent-Programming-on-Windows/dp/B0015DYKI4/ref=ed_oe_k
C# 2008 and 2005 Threaded Programming: Beginner’s Guide
Author: Gastón C. Hillar
http://www.amazon.com/2008-2005-Threaded-Programming-Beginners/dp/1847197108
LikeLike