Parallel Extensions June CTP

Either my timing is great or it’s lousy – you decide. Yesterday I posted about parallelising Conway’s Life – and today the new CTP for the Parallel Extensions library comes out! The bad news is that it meant I had to run all the tests again… the good news is that it means we can see whether or not the team’s work over the last 6 months has paid off.

Breaking change

The only breaking change I’ve seen is that AsParallel() no longer takes a ParallelQueryOptions parameter – instead, you call AsOrdered() on the value returned from AsParallel(). It was an easy change to make, and worked first time. There may well be plenty of other breaking API changes which are more significant, of course – I’m hardly using any of it.

Benchmark comparisons

One of the nice things about having the previous blog entries is that I can easily compare the results of how things were then with how they are now. Here are the test results for the areas of the previous blog posts which used Parallel Extensions. For the Game of Life, I haven’t included the results with the rendering for the fast version, as they’re still bound by rendering (unsurprisingly).

Mandelbrot set

(Original post)

Results are in milliseconds taken to plot the whole image, so less is better. (x;y) values mean “Width=x, MaxIterations=y.”

Description	December CTP	June CTP
ParallelForLoop (1200;200)	376	380
ParallelForLoop (3000;200)	2361	2394
ParallelForLoop (1200;800)	1292	1297
ParallelLinqRowByRowInPlace (1200;200)	378	393
ParallelLinqRowByRowInPlace (3000;200)	2347	2440
ParallelLinqRowByRowInPlace (1200;800)	1295	1939
ParallelLinqRowByRowWithCopy (1200;200)	382	411
ParallelLinqRowByRowWithCopy (3000;200)	2376	2484
ParallelLinqRowByRowWithCopy (1200;800)	1288	1401
ParallelLinqWithGenerator (1200;200)	4782	4868
ParallelLinqWithGenerator (3000;200)	29752	31366
ParallelLinqWithGenerator (1200;800)	16626	16855
ParallelLinqWithSequenceOfPoints (1200;200)	549	533
ParallelLinqWithSequenceOfPoints (3000;200)	3413	3290
ParallelLinqWithSequenceOfPoints (1200;800)	1462	1460
UnorderedParalleLinqInPlace (1200;200)	422	440
UnorderedParalleLinqInPlace (3000;200)	2586	2775
UnorderedParalleLinqInPlace (1200;800)	1317	1475
UnorderedParallelLinqInPlaceWithDelegate (1200;200)	509	514
UnorderedParallelLinqInPlaceWithDelegate (3000;200)	3093	3134
UnorderedParallelLinqInPlaceWithDelegate (1200;800)	1392	1571
UnorderedParallelLinqInPlaceWithGenerator (1200;200)	5046	5511
UnorderedParallelLinqInPlaceWithGenerator (3000;200)	31657	30258
UnorderedParallelLinqInPlaceWithGenerator (1200;800)	17026	19517
UnorderedParallelLinqSimple (1200;200)	556	595
UnorderedParallelLinqSimple (3000;200)	3449	3700
UnorderedParallelLinqSimple (1200;800)	1448	1506
UnorderedParalelLinqWithStruct (1200;200)	511	534
UnorderedParalelLinqWithStruct (3000;200)	3227	3154
UnorderedParalelLinqWithStruct (1200;800)	1427	1445

A mixed bag, but overall it looks to me like the June CTP was slightly worse than the older one. Of course, that’s assuming that everything else on my computer is the same as it was a couple of weeks ago, etc. I’m not going to claim it’s a perfect benchmark by any means. Anyway, can it do any better with the Game of Life?

Game of Life

(Original post)

Results are in frames per second, so more is better.

Description	December CTP	June CTP
ParallelFastInteriorBoard (rendered)	23	22
ParallelFastInteriorBoard (unrendered)	29	28
ParallelFastInteriorBoard	508	592

Yes, ParallelFastInteriorBoard really did get that much speed bump, apparently. I have no idea why… which leads me to the slightly disturbing conclusion of this post:

Conclusion

The numbers above don’t mean much. At least, they’re not particularly useful because I don’t understand them. Why would a few tests become “slightly significantly worse” and one particular test get markedly better? Do I have serious benchmarking issues in terms of my test rig? (I’m sure it could be a lot “cleaner” – I didn’t think it would make very much difference.)

I’ve always been aware that off-the-cuff benchmarking like this was of slightly dubious value, at least in terms of the details. The only facts which are really useful here are the big jumps due to algorithm etc. The rest is noise which may interest Joe and the team (and hey, if it proves to be useful, that’s great) but which is beyond my ability to reason about. Modern machines are complicated beasts, with complicated operating systems and complicated platforms above them.

So ignore the numbers. Maybe the new CTP is faster for your app, maybe it’s not. It is important to know it’s out there, and that if you’re interested in parallelisation but haven’t played with it yet, this is a good time to pick it up.

4 thoughts on “Parallel Extensions June CTP”

The way that I think about it is “hey, they just made a bunch of architectural changes, presumably for the better, and none of it totally screwed everything up!” Performance-tuning—at the level of resolution that would show the most of the differences above—should be the one of the last steps, I would think.

There’s also this bit:

In this CTP, PLINQ is implemented on top of the Task Parallel Library, which does not yet have thread injection in this CTP (see the related above issue #1). Some PLINQ queries require more concurrently-running tasks than the number of threads that Task Parallel Library creates, so you may observe deadlocks. Specifically, binary operators like SelectMany and Join that use the output of another PLINQ query as the second data source are likely to hit this. We have provided a workaround for this CTP: the previous implementation, which runs on top of the .NET ThreadPool, is still available. Just set the PLINQ_USE_THREADPOOL environment variable to a non-empty value and PLINQ will revert back to the ThreadPool. This setting will go away in subsequent releases.

LikeLike

Hey Jon, I’ve been enjoying your posts. Remember that, at this point, not a lot of performance optimizations have been done to the CTPs as the focus is really on the API and functionality. So, I’m not too surprised that there are some performance anomalies (increases) in there. Especially the fact that PLINQ switched its underlying implementation from ThreadPool-based to TPL-based.

I’m sure we’ll see performance within Parallel Extensions really start to soar as we get closer to release :).

Keep up the great work!

LikeLike

My first thought with respect to the performance changes: in addition to your other observations, I’ll also add that getting good performance from incorrect code is often a lot easier than getting good performance from correct code. (This has been a classic issue with graphics card benchmarking…do you count better frame rates if they come at the expense of accurate rendering?)

In other words, it’s entirely possible that in the course of fixing some bugs in the Parallel Extensions, certain things got slower. :)

LikeLike

PLINQ is wonderful. I am working with Parallel Extensions. I want to exploit multicore now!
Two books recommended for those interested in multicore development with C# 2008 and with future C# 2010:
Concurrent Programming on Windows
Author: Joe Duffy

C# 2008 and 2005 Threaded Programming: Beginner’s Guide
Author: Gastón C. Hillar

LikeLike

Jon Skeet's coding blog

Parallel Extensions June CTP

Breaking change

Benchmark comparisons

Mandelbrot set

Game of Life

Conclusion

4 thoughts on “Parallel Extensions June CTP”

Leave a comment Cancel reply

Breaking change

Benchmark comparisons

Mandelbrot set

Game of Life

Conclusion

Share this:

Related

4 thoughts on “Parallel Extensions June CTP”

Leave a comment Cancel reply