Category Archives: C#

Book review: Accelerated C# 2008 by Trey Nash

Time for another book review, and this time it’s a due to a recommendation from a reader who has this one, C# in Depth and Head First C#.

Resources

Introduction and disclaimer

My normal book review disclaimer applies, but probably more so than ever before. Yes, Accelerated C# 2008 is a competitor to C# in Depth. They’re different in many ways, but many people would no doubt be in the target audience for both books. If you meet that criterion, please be aware that as the author of C# in Depth I can’t possibly be 100% objective when reviewing another C# book. That said, I’ll try to justify my opinions everywhere I can.

Target audience and content overview

Accelerated C# 2008 is designed to appeal to existing developers with experience in an OO language. As one of the Amazon reviews notes, you may struggle somewhat if you don’t have any .NET experience beforehand – while it should be possible to read it knowing only Java or C++, there are various times where a certain base level of knowledge is assumed and you’ll want to refer to MSDN for some background material. If you come at the book with no OO experience at all, I expect you’ll have a hard time. Chapter 4 does cover the basics of OO in .NET (classes, structs, methods, properties etc) this isn’t really a beginner’s book.

In terms of actual content covered, Accelerated C# 2008 falls somewhere between C# in Depth (almost purely language) and C# 3.0 in a Nutshell (language and then core libraries). It doesn’t attempt to cover all the core technologies (IO, reflection, security, interop etc are absent) but it goes into detail beyond the C# language when it comes to strings, exceptions, collections, threading and more. As well as purely factual information, there’s a lot of guidance as well, including a whole chapter entitled “In Search of C# Canonical Forms.”

General impressions

I’d like to make it clear to start with that I like the book. I have a number of criticisms, none of which I’m making up for the sake of being critical – but that in no way means it’s a bad book at all. It’s very unlikely that you know everything in here (I certainly didn’t) and the majority of the guidance is sound. The code examples are almost always self-contained (a big plus in my view) and Trey’s style is very readable. Where there are inaccuracies, they’re usually pretty harmless, and the large amount of accurate and insightful material makes up for them.

Just as I often compare Java to C# in my book, so Trey often compares C++ to C# in his. While my balance of C# to C++ knowledge is such that these comments aren’t particularly useful to me, I can see them being good for a newcomer to C# from a C++ background. I thought there might have been a few too many comparisons (I understood the point about STL and lambdas/LINQ the first time round…) but that’s just a minor niggle.

Where C# in Depth is primarily a “read from start to finish” book and C# 3.0 in a Nutshell is primarily a reference book (both can be used the other way, of course) Accelerated C# 2008 falls between the two. It actually achieves the best of both worlds to a large extent, which is an impressive feat. The ordering could be improved (more on this later on) but the general feeling is very good.

One quick word about the size of the book in terms of content: if you’re one of those people who judges the amount of useful content in a book on its page count, it’s worth noting that the font in this book is pretty small. I would guess that it packs about 25% more text per page than C# in Depth does, taking its “effective” page count from around 500 to 625. Also, the content is certainly meaty – you’re unlikely to find yourself skimming over loads of simple stuff trying to get to the good bits. Speaking of “getting to the good bits” let’s tackle my first significant gripe.

Material organisation

If you look at the tables of contents for Accelerated C# 2008 and Accelerated C# 2005, you’ll notice that the exact same chapter titles in the 2005 edition carry over in the same order in the 2008 edition. There are three extra chapters in the new edition, covering extension methods, lambda expressions and LINQ. That’s not to say that the content of the “duplicate” chapters is the same as before – C# 3.0 features are introduced in the appropriate place within existing chapters. In terms of ordering the chapters, I think it would be have been much more appropriate to keep the last chapter of the old edition – “In Search of C# Canonical Forms” – as the last chapter of the new edition. Apart from anything else, that would allow it to include hints and tips involving the new C# 3 features which are currently covered later. It really feels like a “wrapping up” chapter, and deserves to be last.

That’s not the only time that the ordering felt strange, however. Advanced topics (at least ones which feel advanced to me) are mixed in with fairly basic ones. For instance, in the chapter on exceptions, there’s a section about “exception neutrality” which includes details about constrained execution regions and critical finalizers. All interesting stuff – even though I wish there were more of a prominent warning saying, “This is costly to both performance and readability: only go to these lengths when you really, really need to.” However, this comes before a section about using try/finally blocks and the using statement to make sure that resources are cleaned up however a block is exited. I can’t imagine anyone who knows enough C# to take in the exception neutrality material also not knowing about try/finally or the using statement (or how to create your own custom exception class, which comes between these two topics).

Likewise the chapter which deals with collections, including generic ones, comes before the chapter on generics. If I were a reader who didn’t know generics already, I think I’d get very confused reading about ICollection<T> without knowing what the T meant. Now don’t get me wrong: ordering material so that you don’t get “circular references” is often hard if not impossible. I just think it could have been done better here.

Aiming too deep?

It’s not like me to criticise a book for being too deep, but I’m going to make an exception here. Every so often, I came away from a topic thinking that it would have been better covered a little bit more lightly. Sometimes this was because a running example became laborious and moved a long way from anything you were actually likely to want to do in real life. The sections on “borrowing from functional programming” and memoization/currying/anonymous recursion felt guilty of this. It’s not that they’re not interesting topics, but the examples picked didn’t quite work for me.

The other problem with going deep is that you really, really need to get things right – because your readers are less likely to spot the mistakes. I’ll give three examples here:

  • Trey works hard on a number of occasions to avoid boxing, and points it out each time. Without any experience in performance tuning, you’d be forgiven for thinking that boxing is the primary cause of poor performance in .NET applications based on this book. While I agree that it’s something to be avoided where it’s possible to do so without bending the design out of shape, it doesn’t deserve to be laboured as much as it is here. In particular, Trey gives an example of a complex number struct and how he’s written appropriate overloads etc to avoid boxing. Unfortunately, to calculate the magnitude of the complex number (used to implement IComparable in a manner which violates the contract, but that’s another matter) he uses Math.Pow(real, 2) + Math.Pow(img, 2). Using a quick and dirty benchmark, I found that using real * real + img * img instead of Math.Pow made far, far more difference than whether or not the struct was boxed. (I happen to think it’s more readable code too, but never mind.) There was nothing wrong with avoiding the boxing, but in chasing the small performance gains, the big ones were missed.

  • In the chapter on threading, there are some demonstrations of lock-free programming (before describing locking, somewhat oddly – and without describing the volatile modifier). Now, personally I’d want to discourage people from attempting lock-free programming at all unless they’ve got a really good reason (with evidence!) to support that decision – but if you’re going to do it at all, you need to be hugely careful. One of the examples basically has a load of threads starting and stopping, updating a counter (correctly) using Interlocked.Increment/Decrement. Another thread monitors the count and periodically reports it – but unfortunately it uses this statement to do it:

    threadCount = Interlocked.Exchange(ref numberThreads, numberThreads);

    The explanation states: “Since the Interlocked class doesn’t provide a method to simply read an Int32 value in an atomic operation, all I’m doing is swapping the numberThreads variable’s value with its own value, and, as a side effect, the Interlocked.Exchange method returns to me the value that was in the slot.” Well, not quite. It’s actually swapping the numberThreads variable’s value with a value evaluated at some point before the method call. If you rewrite the code like this, it becomes more obviously wrong:

    int tmp = numberThreads;
    Thread.Sleep(1000); // What could possibly happen during this time, I wonder?
    threadCount = Interlocked.Exchange(ref numberThreads, tmp);

    The call to Thread.Sleep is there to make it clear that numberThreads can very easily change between the initial read and the call to Interlocked.Exchange. The correct fix to the code is to use something like this:

    threadCount = Interlocked.CompareExchange(ref numberThreads, 0, 0);

    That sets numberThreads atomically to the value 0 if (and only if) its value is already 0 – in other words, it will never actually change the value, just report it. Now, I’ve laboured the explanation of why the code is wrong because it’s fairly subtle. Obvious errors in books are relatively harmless – subtle ones are much more worrying.

  • As a final example for this section, let’s look at iterator blocks. Did you know that any parameters passed to methods implemented using iterator blocks become public fields in the generated class? I certainly didn’t. Trey pointed out that this meant they could easily be changed with reflection, and that could be dangerous. (After looking with reflector, it appears that local variables within the iterator block are also turned into public fields.) Now, leaving aside the fact that this is hugely unlikely to actually bite anyone (I’d be frankly amazed to see it as a problem in the wild) the suggested fix is very odd.

    The example Trey gives is where originally a Boolean parameter is passed into the method, and used in two places. Oh no! The value of the field can be changed between those two uses, which could lead to problems! True. The supposed fix is to wrap the Boolean value in an immutable struct ImmutableBool, and pass that in instead. Now, why would that be any better? Certainly you can’t change the value within the struct – but you can easily change the field‘s value to be a completely different instance of ImmutableBool. Indeed, the breakage would involve exactly the same code, just changing the type of the value. The other train of thought which suggests that this approach would fail is that bool is already immutable, so it can’t be the mutability of the type of the field that causes problems. I’m sure there are much more useful things that Trey could have said in the two and a half pages he spent describing a broken fix to an unimportant problem.

Sorry, that was getting ranty for a bit… but I hope you understand why. Before concluding this review, let’s look at one chapter which is somewhat different to the rest, and which I’ve mentioned before:

In Search of C# Canonical Forms (aka “Design and Implementation Guidelines” :)

I’d been looking forward to this part of the book. I’m always interested in seeing what other people think the most important aspects of class design are. The book doesn’t go into much detail about abstract orientation (in this chapter, anyway – there’s plenty scattered through the book) but concentrates on core interfaces you might implement, etc. That’s fine. I’m still waiting for a C# book to be written to truly be on a par with Effective Java (I have the second edition waiting to be read at work…) but I wasn’t expecting it all to be here. So, was this chapter worth the wait?

Somewhat. I was very glad to see that the first point around reference types was “Default to sealed classes” – I couldn’t agree more, and the arguments were well articulated. Many other guidelines were either entirely reasonable or at least I could go either way on. There were a few where I either disagreed or at least would have put things differently:

  • Implementing cloning with copy constructors: one point about cloning which wasn’t mentioned is that (to quote MSDN) “The resulting clone must be of the same type as or a compatible type to the original instance.” The suggested implementation of Clone in the book is to use copy constructors. This means that every subclass must override Clone to call its own copy constructor, otherwise the instance returned will be of the wrong type. MemberwiseClone always creates an instance of the same type. Yes, it means the constructor isn’t called – but frankly the example given (performing a database lookup in the constructor) is a pretty dodgy cloning scenario in the first place, in my view. If I create a clone and it doesn’t contain the same data as the original, there’s something wrong. Having said that, the caveats Trey gives around MemberwiseClone are all valid in and of themselves – we just disagree about their importance. The advice to not actually implement ICloneable in the first place is also present (and well explained).
  • Implementing IDisposable: Okay, so this is a tough topic, but I was slightly disappointed to see the recommendation that “it’s wise for any objects that implement the IDisposable interface to also implement a finalizer […]” Now admittedly on the same page there’s the statement that “In reality, it’s rare that you’ll ever need to write a finalizer” but the contradiction isn’t adequately resolved. A lot of people have trouble understanding this topic, so it would have been nice to see really crisp advice here. My 20 second version of it is: “Only implement a finalizer if you’re holding on to resources which won’t be cleaned up by their own finalizers.” That actually cuts out almost everything, unless you’ve got an IntPtr to a native handle (in which case, use SafeHandle instead).
    • As a side note, Trey repeatedly claims that “finalizers aren’t destructors” which irks me somewhat as the C# spec (the MS version, anyway) uses the word “destructor” exclusively – a destructor is the way you implement a .NET finalizer in C#. It would be fine to say “destructors in C# aren’t deterministic, unlike destructors in C++” but I think it’s worth acknowledging that the word has a valid meaning in the context of C#. Anyway…
  • Implementing equality comparisons: while this was largely okay, I was disappointed to see that there wasn’t much discussion of inheritance and how it breaks equality comparisons in a hard-to-fix way. There’s some mention of inheritance, but it doesn’t tackle the issue I think is thorniest: If I’m asking one square whether it’s equal to another square, is it enough to just check for everything I know about squares (e.g. size and position)? What about if one of the squares is actually a coloured square – it has more information than a “basic” square. It’s very easy to end up with implementations which break reflexivity, simply because the question isn’t well-defined. You effectively need to be asking “are these two objects equal in <this> particular aspect” – but you don’t get to specify the aspect. This is an example where I remember Effective Java (first edition) giving a really thorough explanation of the pitfalls and potential implementations. The coverage in Accelerated C# 2008 is far from bad – it just doesn’t meet the gold standard. Arguably it’s unfair to ask another book to compete at that level, when it’s trying to do so much else as well.
  • Ordering: I mentioned earlier on that the complex number class used for a boxing example failed to implement comparisons appropriately. Unfortunately it’s used as the example specifically for “how to implement IComparable and IComparable<T>” as well. To avoid going into too much detail, if you have two instances x and y such that x != y but x.Magnitude == y.Magnitude, you’ll find x.CompareTo(y) == y.CompareTo(x) (but with a non-zero result in both cases). What’s needed here is a completely different example – one with a more obvious ordering.
  • Value types and immutability: Okay, so the last bullet on the value types checklist is “Should this struct be immutable? […] Values are excellent candidates to be immutable types” but this comes after “Need to boxed instances of value? Implement an interface to do so […]” No! Just say no to mutable value types to start with! Mutable value types are bad, bad, bad, and should be avoided like the plague. There are a very few situations where it may be appropriate, but to my mind any advice checklist for implementing structs should make two basic points:
    • Are you sure you really wanted a struct in the first place? (They’re rarely the right choice.)
    • Please make it immutable! Pretty please with a cherry on top? Every time a struct is mutated, a cute kitten dies. Do you really want to be responsible for that?

Conclusion

At the risk – nay, certainty – of repeating myself, I’m going to say that I like the book despite the (sometimes subjective) flaws pointed out above. As Shakespeare wrote in Julius Caesar, “The evil men do lives after them. The good is oft interred with their bones.” So it is with book reviews – it’s a lot easier to give specific examples of problems than it is to report successes – but the book does succeed, for the most part. Perhaps the root of almost all my reservations is that it tries to do too much – I’m not sure whether it’s possible to go into that much detail and cater for those with little or no previous C# experience (even with Java/C++) and keep to a relatively slim volume. It was a very lofty goal, and Trey has done very well to accomplish what he has. I would be interested to read a book by him (and hey, potentially even collaborate on it) which is solely on well-designed classes and libraries.

In short, I recommend Accelerated C# 2008, with a few reservations. Hopefully you can judge for yourself whether my reservations would bother you or not. I think overall I slightly prefer C# 3.0 in a Nutshell, but the two books are fairly different.

Reaction

I sent this to Trey before publishing it, as is my custom. He responded to all my points extremely graciously. I’m not sure yet whether I can post the responses themselves – stay tuned for the possibility, at least. My one problem with reviewing books is that I end up in contact with so many other authors who I’d like to work with some day, and that number has just increased again…

Judging a book by its cover (or title)

I’ve ranted about versioning before (and indeed in C# in Depth). I still believe that Microsoft didn’t do the world any favours when they introduced a relatively minor set of changes (just libraries, albeit important ones) with .NET 3.0 and a more major set of changes (languages, LINQ, core library improvements) with .NET 3.5. Using 2.5 and 3.0 would have made more sense, IMO. But never mind.

The fact is, people are confused about what version number applies to what. A number of people claim to be using C# 3.5 when they mean either C# 3.0 or .NET 3.5. (For a quick reference of what’s actually what, see my article on the issue.)

Okay, so far it’s the fault of Microsoft for being confusing and the fault of developers for not keeping up. Both of these are far more forgiveable in my view than being flat out wrong as many books are at the moment. I don’t believe this is any indication on the quality of the book itself (Accelerated C# 2008 is pretty good so far, for example) but I still think it’s pretty awful to make a title so confusing. So, here are some bad titles and others which use version numbers appropriately. (I’ve left out titles like Head First C# and C# in Depth which don’t specify version numbers.)

Bad

  • Professional C# 2008 (Wrox)
  • Pro C# 2008 and the .NET 3.5 Platform (Apress) (only partly incorrect, of course)
  • Murach’s C# 2008 (Mike Murach and Associates)
  • Accelerated C# 2008 (Apress)
  • Illustrated C# 2008 (Apress)
  • Pro LINQ: Language Integrated Query in C# 2008 (APress)
  • Pro ASP.NET 3.5 in C# 2008 (Apress) (again, only partially incorrect)
  • Beginning C# 2008 (Apress)
  • Beginning C# 2008 Databases (Apress)

Good

  • C# 3.0 in a Nutshell (O’Reilly)
  • Programming C# 3.0 (O’Reilly)
  • C# 3.0 Cookbook (O’Reilly)
  • C# 3.0 Design Patterns (O’Reilly)
  • C# 3.0 Pocket Reference (O’Reilly)
  • Pro C# with .NET 3.0 (Apress)
  • Beginning C# 3.0 (Wrox)
  • C# 3.0 Unleashed: With the .NET Framework 3.5 (Sams)

Okay

This is the “well, just about” list – because they’re referring to “Microsoft Visual C# 2008” instead of C# 2008, it’s referring to the IDE instead of the language. I think it’s better to name a book after the language instead of the tool you use to write in the language, personally…

  • Beginning Microsoft Visual C# 2008 (Wrox)
  • Microsoft Visual C# 2008 Step By Step (Microsoft)
  • Programming Microsoft Visual C# 2008 (Microsoft)
  • Microsoft Visual C# 2008 Express Edition: Build a Program Now! (Microsoft)

Notice a pattern? If anyone at Apress is reading this (unlikely, I know) – there’s no such thing as C# 2008“.

Rant over for the moment. With any luck I might be able to finish reading Accelerated C# 2008 fairly soon, and give a proper book review.

Automatic lambda expressions

This morning I happened to show a colleague (Malcolm Rowe) the neat trick of using nullable types and the null-coalescing operator (??) to implement compound comparisons in C#. He asked whether it wouldn’t have been nicer to make this a library feature rather than a language feature. I’m all for putting features into libraries where possible, but there’s a problem in this case: the ?? operator doesn’t evaluate its right operand unless the left operand evaluates to null. This can’t be replicated in a library. Or can it?

The obvious way to lazily evaluate an expression is to turn it into a closure. So, we can write out coalescing method as:

 

public static T CoalesceNulls<T>(T lhs, Func<T> rhs)
{
    return lhs != null ? lhs : rhs();
}

 

That’s quite an efficient way of doing it, but the assymetry isn’t ideal. We can fix that by making the first argument a function too:

 

public static T CoalesceNulls<T>(Func<T> lhs, Func<T> rhs)
{
    T first = lhs();
    return first != null ? first : rhs();
}

 

One of the nice things you can do with the null-coalescing operator is use it for multiple expressions, e.g. a ?? b ?? c ?? d which will evaluate a, then (if it’s null) evaluate b, then (if that’s null) evaluate c etc. With the symmetry present we can now make this into a parameter array:

 

public static T CoalesceNulls<T>(params Func<T>[] functions)
{
    T current = default(T);
    foreach (Func<T> func in functions)
    {
        current = func();
        if (current != null)
        {
            break;
        }
    }
    return current;
}

 

(In some ways it’s still more elegant to specify a bare T as the first parameter, as that ensures that at least you’ve got one function to call. The change required is pretty obvious.)

Now there are only two problems: invoking the method, and the performance characteristics. I’m going to ignore the latter – yes, you could end up with a significant performance hit creating all these closures all the time if you use this in performance critical code. There’s not a lot of options available there, really. But what about the syntax for invoking the method?

In its current form, we can write the current a ?? b ?? c ?? d as CoalesceNulls(() => a, () => b, () => c, () => d)l. That’s a wee bit ugly. Malcolm blogged about an alternative where the parameters could be declared on the declaration side) as calling for deferred execution, and automatically converted into closures by the compiler.

Just like Malcolm, I’m not terribly keen on this – for instance, I really like the fact that in C# it’s always very clear when a parameter is being passed by reference, because there’s the ref keyword on the caller side as well as the declaration side. Going against this would just feel wrong.

If we have to change the language and introduce new keywords or symbols, we’re then back where we started – the whole idea was to avoid having to introduce ?? into the language. The plus side is that it could be “usage agnostic” – just because it could be used for a null-coalescing operator replacement doesn’t mean it would have to be. However, this feels like using a sledgehammer to crack a nut – the number of times this would be useful is pretty minimal. The same argument could be applied to the null-coalescing operator itself, of course, but I think its utility – and the way it fits into the language pretty seamlessly – justifies the decisions made here.

Still, it was an interesting thought experiment…

Update: Since originally writing this (a few days ago, before the blog outage, Malcolm has been informed that Scala has precisely this capability. Just another reason for getting round to learning Scala at some point…

Guest post: Joe Albahari reviews C# in Depth

Joe Albahari, co-author of the excellent C# 3.0 in a
Nutshell
(previously reviewed here) kindly agreed to review C# in Depth. Not only has he provided the review below,
but he also supplied several pages of notes made while he was reading it. Many of those
notes have been incorporated into the C# in Depth notes page – it’s always good to include thoughtful feedback. (And I always welcome more, hint hint.)

Without further ado, here’s Joe’s review.

C# in Depth: Review

After having been invited to
review this book by two people at Manning—as well as Jon himself—I
figure it’s about time I came forward! Bear in mind that I’m not
a typical reader: I’m an author, and this makes me more critical than
most. This is especially true given that I wrote C# 3.0 in a Nutshell
with a coauthor (imagine two people constantly searching for ways to
improve each others’ work!). So I will do my best to compensate and
strive to be fair. Please post a comment if you feel I’ve missed the
mark!

Scope

While most other C# books cover
the language, the CLR and at least some aspects of the Framework, C#
in Depth concentrates largely on just the language. You won’t find discussions
on memory management, assemblies, streams and I/O, security, threading,
or any of the big APIs like WPF or ASP.NET. This is good in that doesn’t
duplicate the books already out there, as well as giving more space
for the language.

You might expect that a book
focusing on the C# language itself would cover all of it. But interestingly,
the book covers only about a quarter of the C# language, namely the
features new to C# 2 and C# 3. This sets its target audience: programmers
who already know C# 1, but have not yet switched to C# 2 and 3. This
creates a tight focus, allowing it to devote serious space to topics
such as generics, nullable types, iterators and lambda expressions.
It’s no exaggeration to say that this book covers less than one tenth
of the ground of most other C# books, but gives that ground ten times
as much attention.

Organization and Style

The book is divided into three
parts:

  • Preliminaries (delegates
    and the type system)
  • Features new to
    C# 2.0
  • Features new to
    C# 3.0

I like this organization: it
presents topics in an order roughly similar to how I teach when giving
tutorials on LINQ—starting with the foundations of delegates and generics,
before moving on to iterators and higher-order functions, and then finally
LINQ. Sometimes the routes are a little circuitous and involve some
huffing and puffing, but the journey is worthwhile and helps to solidify
concepts.

C# in Depth is a tutorial that
gradually builds one concept upon another and is designed primarily
for sequential reading. The examples don’t drag on over multiple sections,
however, so you can jump in at any point (assuming you understand the
preceding topics). The examples are all fairly short, too, which is
very much in my taste. In fact, I would say Jon and I think very much
alike: when he expresses an opinion, I nearly always agree wholeheartedly.

A big trap in writing tutorials,
is assuming knowledge of topics that you teach later. This book rarely
falls victim to this. The writer is also consistent in his use of terminology—and
sticks with the C# Language Specification which I think sets a good
example to all authors. Jon is not sloppy with concepts and is careful
in his wording to avoid misinterpretation. One thing that comes through
is that Jon really understands the material deeply himself.

If I were to classify this
book as beginner/intermediate/advanced, I’d say intermediate-to-advanced.
It’s quite a bit more advanced than, say, Jesse’s tutorial “Programming
C#”.

The layout of the book is pleasing—I
particularly like the annotations alongside the code listings.

Content

In the first section, “Preparing
for the Journey,” the book does cover a few C# 1 topics, namely delegates
and C#’s type system. Jon’s handling of these topics is excellent: his
discussion of static, explicit and safe typing is clear and helpful,
as is the section on value types versus reference types. I particularly
liked the section “Dispelling Myths”—this is likely to be
of use even to experienced developers. This chapter, in fact, leaves
the reader pining for more advanced C# 1 material.

The C# 2 features are very
well covered. The section on generics includes such topics as their
handling by the JIT compiler, the subtleties of type inference, a thorough
discussion on constraints, covariance/contravariance limitations, and
comparisons with Java’s generics and C++’s templates. Nullable types
are covered similarly well, with suggested patterns of use, as are anonymous
methods and iterators.

The C# 3 features are also
handled well. I like how Jon introduces expression trees—first building
them programmatically, and then showing how the compiler provides a
shortcut via lambda expressions. The book covers query expressions and
the basics of LINQ, and includes a brief explanation of each of the
standard query operators in an appendix. There’s also a chapter called
“LINQ Beyond Collections” which briefly introduces the LINQ to SQL,
LINQ to DataSet and LINQ to XML APIs.

Throughout the book, Jon goes
to some lengths to explain not just “what”, but “why”. This
book isn’t for people who want to get in and out quick so they can
get their job done and out of the way—it’s for people who enjoy
working elegantly with their tools, through a rich understanding of
the language’s background, subtleties and nuances.

Of course, digesting all this
is a bit of work (Chapter 3’s summary opens with the word “Phew!”).
Despite this, I think Jon does a good job at explaining difficult things
well. I don’t think I’ve seen any IL listings in the book, which is
a good sign in general. I’m always wary when an author, in explaining
a C# concept, says, “to understand XYZ, we must examine the IL”.
I take issue with this: rarely, if ever, does one need to look at IL
to understand C#, and doing so creates unnecessary complication by choosing
the wrong level of abstraction. That isn’t to saying looking at IL isn’t
useful for a deeper understanding of the CLR—but only after first
teaching C# concepts independently of IL.

What’s Missing

It was in Jon’s design critieria
not build a tome—instead to write a small(ish) book that complements
rather than replaces books such as C# 3.0 in a Nutshell. Most things
missing from C# in Depth are consistent with its focus (such as the
CLR, threading, .NET Framework, etc.) The fact that C# in Depth excludes
the features of C# that were introduced prior to version 2 is a good
thing if you’re looking for a “delta” book, although, of course,
it makes it less useful as a language reference.

The book’s treatment of LINQ
centres largely on LINQ to Objects. If you’re planning on learning
C# 3.0 so that you can query databases through LINQ, the book’s focus
is not ideal, if read in isolation. I personally prefer the approach
of covering “remote” query architecture earlier and in more detail
(in conjunction with the canonical API, LINQ to SQL) – so that when
it comes time to teach query operators such as SelectMany, Group and
Join, they can be demonstrated in the context of both local and database
queries. I also strive, when writing on LINQ, to cover enough querying
ground that readers can “reproduce” their SQL queries in LINQ—even
though it means having to get sidetracked with API practicalities. Of
course, getting sidetracked with API practicalities is undesirable for
a language-centric book such as C# in Depth, and so the LINQ to Objects
focus is understandable. In any case, reading Chapters 8-10 of C# 3.0
in a Nutshell would certainly fill in the gaps. Another complementary
book would be Manning’s LINQ in Action (this book is well-reviewed
on Amazon, though I’ve not yet read it).

Summary

This book is well written,
accurate and insightful, and complements nearly every other book out
there. I would recommend it to anyone wanting a thorough “inside”
tutorial on the features new to C# 2 and 3.

Parallel Extensions June CTP

Either my timing is great or it’s lousy – you decide. Yesterday I posted about parallelising Conway’s Life – and today the new CTP for the Parallel Extensions library comes out! The bad news is that it meant I had to run all the tests again… the good news is that it means we can see whether or not the team’s work over the last 6 months has paid off.

Breaking change

The only breaking change I’ve seen is that AsParallel() no longer takes a ParallelQueryOptions parameter – instead, you call AsOrdered() on the value returned from AsParallel(). It was an easy change to make, and worked first time. There may well be plenty of other breaking API changes which are more significant, of course – I’m hardly using any of it.

Benchmark comparisons

One of the nice things about having the previous blog entries is that I can easily compare the results of how things were then with how they are now. Here are the test results for the areas of the previous blog posts which used Parallel Extensions. For the Game of Life, I haven’t included the results with the rendering for the fast version, as they’re still bound by rendering (unsurprisingly).

Mandelbrot set

(Original post)

Results are in milliseconds taken to plot the whole image, so less is better. (x;y) values mean “Width=x, MaxIterations=y.”

Description December CTP June CTP
ParallelForLoop (1200;200) 376 380
ParallelForLoop (3000;200) 2361 2394
ParallelForLoop (1200;800) 1292 1297
ParallelLinqRowByRowInPlace (1200;200) 378 393
ParallelLinqRowByRowInPlace (3000;200) 2347 2440
ParallelLinqRowByRowInPlace (1200;800) 1295 1939
ParallelLinqRowByRowWithCopy (1200;200) 382 411
ParallelLinqRowByRowWithCopy (3000;200) 2376 2484
ParallelLinqRowByRowWithCopy (1200;800) 1288 1401
ParallelLinqWithGenerator (1200;200) 4782 4868
ParallelLinqWithGenerator (3000;200) 29752 31366
ParallelLinqWithGenerator (1200;800) 16626 16855
ParallelLinqWithSequenceOfPoints (1200;200) 549 533
ParallelLinqWithSequenceOfPoints (3000;200) 3413 3290
ParallelLinqWithSequenceOfPoints (1200;800) 1462 1460
UnorderedParalleLinqInPlace (1200;200) 422 440
UnorderedParalleLinqInPlace (3000;200) 2586 2775
UnorderedParalleLinqInPlace (1200;800) 1317 1475
UnorderedParallelLinqInPlaceWithDelegate (1200;200) 509 514
UnorderedParallelLinqInPlaceWithDelegate (3000;200) 3093 3134
UnorderedParallelLinqInPlaceWithDelegate (1200;800) 1392 1571
UnorderedParallelLinqInPlaceWithGenerator (1200;200) 5046 5511
UnorderedParallelLinqInPlaceWithGenerator (3000;200) 31657 30258
UnorderedParallelLinqInPlaceWithGenerator (1200;800) 17026 19517
UnorderedParallelLinqSimple (1200;200) 556 595
UnorderedParallelLinqSimple (3000;200) 3449 3700
UnorderedParallelLinqSimple (1200;800) 1448 1506
UnorderedParalelLinqWithStruct (1200;200) 511 534
UnorderedParalelLinqWithStruct (3000;200) 3227 3154
UnorderedParalelLinqWithStruct (1200;800) 1427 1445

A mixed bag, but overall it looks to me like the June CTP was slightly worse than the older one. Of course, that’s assuming that everything else on my computer is the same as it was a couple of weeks ago, etc. I’m not going to claim it’s a perfect benchmark by any means. Anyway, can it do any better with the Game of Life?

Game of Life

(Original post)

Results are in frames per second, so more is better.

Description December CTP June CTP
ParallelFastInteriorBoard (rendered) 23 22
ParallelFastInteriorBoard (unrendered) 29 28
ParallelFastInteriorBoard 508 592

Yes, ParallelFastInteriorBoard really did get that much speed bump, apparently. I have no idea why… which leads me to the slightly disturbing conclusion of this post:

Conclusion

The numbers above don’t mean much. At least, they’re not particularly useful because I don’t understand them. Why would a few tests become “slightly significantly worse” and one particular test get markedly better? Do I have serious benchmarking issues in terms of my test rig? (I’m sure it could be a lot “cleaner” – I didn’t think it would make very much difference.)

I’ve always been aware that off-the-cuff benchmarking like this was of slightly dubious value, at least in terms of the details. The only facts which are really useful here are the big jumps due to algorithm etc. The rest is noise which may interest Joe and the team (and hey, if it proves to be useful, that’s great) but which is beyond my ability to reason about. Modern machines are complicated beasts, with complicated operating systems and complicated platforms above them.

So ignore the numbers. Maybe the new CTP is faster for your app, maybe it’s not. It is important to know it’s out there, and that if you’re interested in parallelisation but haven’t played with it yet, this is a good time to pick it up.

More parallelisation fun: Conway’s Game of Life

Okay, so I’ve probably exhausted the Mandelbrot set as a source of interest, at least for the moment. However, every so often someone mentions something or other which sounds like a fun exercise in parallelisation. The latest one is Conway’s Game of Life. Like the Mandelbrot set, this is something I used to play with when I was a teenager – and like the Mandelbrot set, it’s an embarrassingly parallel problem.

As before, I’ve written a few implementations but not worked excessively hard to optimise. All my tests were performed with a game board of 1000×500 cells, displaying at one pixel per cell (partly because that was the easiest way to draw it). I found that with the slower implementations the rendering side was irrelevant to the results, but the rendering became a bottleneck with the fast algorithms, so I’ve included results both with and without rendering.

The “benchmark” code (I use the word very lightly – normal caveats around methodology apply, this was only a weekend evenings project after all) records the “current” number of frames per second roughly once per second, and also an average over the whole run. (The fast algorithm jump around in fps pretty wildly depending on what’s going on – the more naïve ones don’t really change much between a busy board and a dead one.) The starting pattern in all cases was the R-pentomino (or F-pentomino, depending on whose terminology you use) in the middle of the board.

A base class (BytePerPixelBoard) is used by all the implementations – this represents the board with a single byte per pixel, which is really easy to work with but obviously very wasteful in terms of memory (and therefore cache). I haven’t investigated any other representations, but I wanted to get this post out fairly quickly as I have a lot of other stuff to do at the moment. (I also have a post on Java closures which I should tidy up soon, but hey…)

Implementation smells

All the source code is available for download, of course. There are no fancy options for verifying the results or even turning off rendering – just comment out the line in Program.BackgroundLoop which calls Render.

It would be fairly easy to make ParallelFastInteriorBoard derive from SimpleFastInteriorBoard, and ParallelActiveBlockBoard derive from ActiveBlockBoard, but I’ve kept the implementations completely separate for the moment. That means a huge amount of duplication, of course, including a nested class in each of the “active block” implementations – the nested classes are exactly the same. Oh, and they use internal fields instead of properties. I wanted the fields to be read-only, so I couldn’t use automatic properties – but I didn’t want to go to the hassle of having explicitly declared fields and separate properties. Trust me, I wouldn’t do this in production code. (There were some other internal fields, but they’ve mercifully been converted into properties now.)

SimpleBoard

This is about as simple as you can get. It fetches each cell’s current value “carefully” (i.e. coping with wrapping) regardless of where it is on the board. It works, but it’s almost painfully slow.

SimpleFastInteriorBoard

This is just an optimisation of SimpleBoard. We fetch the borders of the board “slowly” as per SimpleBoard, but once we’re on the interior of the board (any cell that’s not on an edge) we know we won’t need to worry about wrapping, so we can just use a fixed set of array offsets relative to the offset of the cell that’s being calculated.

This ends up being significantly faster, although it could still be optimised further. In the current code each cell is tested for whether it’s an interior cell or not. There’s a slight optimisation by remembering the results for “am I looking at the top or the bottom row” for a whole row, but by calculating just the edges and then the interior (or vice versa) a lot of tests could be avoided.

ParallelFastInteriorBoard

Behold the awesome power of Parallel.For. Parallelising SimpleFastInteriorBoard literally took about a minute. I haven’t seen any toolkit other rival this in terms of ease of use. Admittedly I haven’t done much embarrassingly parallel work in many other toolkits, but even so it’s impressive. It’ll be interesting to see how easy it is to use for complex problems as well as simple ones.

If you look at the results, you’ll see this is the first point at which there’s a difference between rendering and not. As the speed of the calculation improves the constant time taken for rendering becomes more significant.

ActiveBlockBoard

This is a departure in terms of implementation. We still use the same backing store (one big array of a byte per cell) but the board is logically broken up into blocks of 25×25 cells. (The size is hard-coded, but only as two constants – they can easily be changed. I haven’t investigated the effect of different block sizes thoroughly, but a few ad hoc tests suggests this is a reasonable sweet spot.) When considering a block in generation n, the changes (if any) in generation n-1 are considered. If the block itself changed, it may change again. If the relevant borders of any of its neighbours changed (including corners of diagonally neighbouring blocks), the cell may change. No other changes in the board can affect the block, so if none of these conditions are met the contents of generation n-1 can be copied to generation n for that block. This is obviously a massive saving when there are large areas of stable board, either dead or with stable patterns (such as 2×2 live blocks). It doesn’t help with blinkers (3×1 blocks which oscillate between vertical and horizontal alignments) or anything similar, but it’s still a big optimisation. It does mean a bit more house-keeping in terms of remembering what changed (anywhere in the cell, the four sides, and the four corners) each generation, but that pales into insignificance when you can remove the work from a lot of blocks.

Again, my implementation is relatively simple – but it still took a lot of work to get right! The code is much more complicated than SimpleFastInteriorBoard, and indeed pretty much all the code from SFIB is used in ActiveBlockBoard. Again, I’m sure there are optimisations that could be made to it, and alternative strategies for implementing it. For instance, I did toy with some code which didn’t keep track of what had changed in the block, but instead pushed changes to relevant neighbouring blocks, explicitly saying, “You need to recalculate next time!” There wasn’t much difference in speed, however, and it needed an extra setup stage in order to make the code understandable, so I’ve removed it.

The performance improvement of this is staggering – it makes it over 20 times faster than the single-threaded SimpleFastInteriorBoard. At first the improvement was much smaller – until I realised that actually I’d moved the bottleneck to the rendering, and re-benchmarked with rendering turned off.

ParallelActiveBlockBoard

Exactly the same implementation as ActiveBlockBoard, but using a Parallel.ForEach loop. Again, blissfully easy to implement, and very effective. It didn’t have the same near-doubling effect of SimpleFastInteriorBoard to ParallelFastInteriorBoard (with rendering turned off) but I suspect this is because the amount of housekeeping work required to parallelise things starts becoming more important at the rates shown in the results. It’s still impressive stuff though.

Results

Implementation FPS with rendering FPS without rendering
SimpleBoard 4 4
SimpleFastInteriorBoard 15 15
ParallelFastInteriorBoard 23 29
ActiveBlockBoard 78 326
ParallelActiveBlockBoard 72 508

Conclusions

  • The Parallel Extensions library rocks my world. Massive, massive kudos to Joe Duffy and the team.
  • I have no idea why ParallelActiveBlockBoard is slower than ActiveBlockBoard when rendering is enabled. This was repeatable (with slightly different results each time, but the same trend).
  • Always be aware that reducing one bottleneck may mean something else becomes your next bottleneck – at first I thought that ActiveBlockBoard was “only” 5 times faster than SimpleFastInteriorBoard; it was only when parallelisation showed no improvement (just the opposite) that I started turning rendering off.
  • Changing the approach can have more pronounced effects than adding cores – moving to the “active block” model gave a massive improvement over brute force.
  • Micro-optimisation has its place, when highly targeted and backed up with data – arguably the “fast interior” is a micro-optimisation, but again the improvement is quite dramatic.
  • Benchmarks are fun, but shouldn’t be taken to be indicative of anything else.
  • Did I mention that Parallel Extensions rocks?

Mandelbrot revisited – benchmark edition

I’ve had fun with the Mandelbrot set in this blog before, using it as an example of an embarrassingly parallelisable problem and demonstrating Parallel LINQ with it.

This morning, over breakfast, I described the problem to Christian Brunschen, a colleague of mine who has some parallelisation experience through implementing OpenMP in a portable manner. He immediately suggested a few possible changes to the way that I’ve approached things. Given the number of different attempts I’ve now had, I thought it would make sense to write a litte benchmarking app which could easily be expanded as further implementations were considered. The benchmark is available for download so you can play around with it yourself. (As is often the case with this kind of thing, it’s not the nicest code in the world for various reasons – the duplication of the sequence generation method, for example… Please don’t use it as a tutorial on actual coding and organisation.)

Benchmark implementation details

The benchmark allows you to vary the size of the generated image and the maximum number of iterations per point. The images can be displayed after the test run, but only the time taken to populate a byte array is recorded. The byte arrays are all allocated before any tests are run, and the garbage collector is invoked (as far as it can be) between tests. The images themselves are only generated after the tests have all completed. Each implementation is given a tiny “dummy run” (creating an image 10 pixels across, with a maximum of 2 iterations per point) first to hopefully remove JITting from the benchmarking times. I won’t pretend that this puts everything on a completely level playing field (benchmarking is hard) but hopefully it’s a good start. We check the similarity between the results of each test and the first one – in some cases they could be “nearly all the same” without there being a bug, due to the subtleties of floating point operations and inlining.

The base class that all the generators use takes care of working out the height from the width, remembering the various configuration options, allocating the array, and also providing a simple imperative method to obtain the value of a single point, without using anything fancy.

I originally wanted to put the whole thing in a nice UI, but after wasting nearly an hour trying to get WPF to behave, I decided to go for a simple console app. One day I really must learn how to write GUIs quickly…

Okay, enough introduction – let’s look at the implementations we’re testing, and then the results. The idea is to get a look at how a number of areas influence the results, as well as seeing some nifty ways of expressing things functionally.

SingleThreadImperativeSimple

This is the most trivial code you could write, basically. As the base class provides the calculation method, we just go through each row and column, work out the value, and store it in the array. The only attempt at optimisation is keeping the “current index” into the array rather than calculating it for every point.

 

public override void Generate()
{
    int index = 0;
    for (int row = 0; row < Height; row++)
    {
        for (int col = 0; col < Width; col++)
        {
            Data[index++] = ComputeMandelbrotIndex(row, col);
        }
    }
}

 

where ComplexMandelbrotIndex is in the base class:

 

protected byte ComputeMandelbrotIndex(int row, int col)
{
    double x = (col * SampleWidth) / Width + OffsetX;
    double y = (row * SampleHeight) / Height + OffsetY;

    double y0 = y;
    double x0 = x;

    for (int i = 0; i < MaxIterations; i++)
    {
        if (x * x + y * y >= 4)
        {
            return (byte)((i % 255) + 1);
        }
        double xtemp = x * x – y * y + x0;
        y = 2 * x * y + y0;
        x = xtemp;
    }
    return 0;
}

 

SingleThreadImperativeInline

This is largely the same code as SingleThreadImperativeSimple, but with everything inlined. Within the main body, there’s no access to anything other than local variables, and no method calls. One further optimisation is available: points which will have a value of 0 aren’t stored (we assume we’ll always start with a cleared array).

 

public override void Generate()
{
    int width = Width;
    int height = Height;
    int maxIterations = MaxIterations;
    int index = 0;
    byte[] data = Data;

    for (int row = 0; row < height; row++)
    {
        for (int col = 0; col < width; col++)
        {
            double x = (col * SampleWidth) / width + OffsetX;
            double y = (row * SampleHeight) / height + OffsetY;

            double y0 = y;
            double x0 = x;

            for (int i = 0; i < maxIterations; i++)
            {
                if (x * x + y * y >= 4)
                {
                    data[index] = (byte)((i % 255) + 1);
                    break;
                }
                double xtemp = x * x – y * y + x0;
                y = 2 * x * y + y0;
                x = xtemp;
            }
            // Leave data[index] = 0 by default
            index++;
        }
    }
}

 

MultiThreadUpFrontSplitImperative

This should be pretty efficient – work out how many cores we’ve got, split the work equally between them (as chunks of rows, which helps in terms of locality and just incrementing an index in the byte array each time), and then run. Wait until all the threads have finished, and we’re done. Shouldn’t be a lot of context switching required normally, and no synchronization is required. However, if some cores are busy with another process, we’ll end up context switching for no gain.

 

public override void Generate()
{
    int cores = Environment.ProcessorCount;

    int rowsPerCore = Height / cores;

    List<Thread> threads = new List<Thread>();
    for (int i = 0; i < cores; i++)
    {
        int firstRow = rowsPerCore * i;
        int rowsToCompute = rowsPerCore;
        if (i == cores – 1)
        {
            rowsToCompute = Height-(rowsPerCore*(cores-1));
        }
        Thread thread = new Thread(() =>
            {
                int index = firstRow * Width;
                for (int row = firstRow; row < firstRow + rowsToCompute; row++)
                {
                    for (int col = 0; col < Width; col++)
                    {
                        Data[index++] = ComputeMandelbrotIndex(row, col);
                    }
                }
            });
        thread.Start();
        threads.Add(thread);
    }

    threads.ForEach(thread => thread.Join());
}

 

MultiThreadRowFetching

Again, we use a fixed number of threads – but this time we start off with a queue of work – one task per row. The queue has to be synchronized every time we use it, and we also have to check whether or not it’s empty. Plenty of scope for lock contention here, particularly as the number of cores increases.

 

public override void Generate()
{
    Queue<int> rowsLeft = new Queue<int>(Enumerable.Range(0, Height));

    List<Thread> threads = new List<Thread>();
    for (int i = 0; i < Environment.ProcessorCount; i++)
    {
        Thread thread = new Thread(() =>
            {
                while (true)
                {
                    int row;
                    lock (rowsLeft)
                    {
                        if (rowsLeft.Count == 0)
                        {
                            break;
                        }
                        row = rowsLeft.Dequeue();
                    }
                    int index = row * Width;
                    for (int col = 0; col < Width; col++)
                    {
                        Data[index++] = ComputeMandelbrotIndex(row, col);
                    }
                }
            });
        thread.Start();
        threads.Add(thread);
    }

    threads.ForEach(thread => thread.Join());
}

SingleThreadLinqSimple

This is the first version of Mandelbrot generation I wrote since thinking of using LINQ, tweaked a litte to dump the output into an existing array instead of creating a new one. It’s essentially the same as SingleThreadImperativeSimple but with the for loops replaced with a query expression.

 

public override void Generate()
{
    var query = from row in Enumerable.Range(0, Height)
                from col in Enumerable.Range(0, Width)
                select ComputeMandelbrotIndex(row, col);

    int index=0;
    foreach (byte b in query)
    {
        Data[index++] = b;
    }
}

 

ParallelLinqRowByRowWithCopy

My first attempt using Parallel LINQ was somewhat disastrous due to the nature of PLINQ’s ordering. To combat this effect, I initially computed a row at a time, then copying each row into place afterwards:

 

private byte[] ComputeMandelbrotRow(int row)
{
    byte[] ret = new byte[Width];
    for (int col = 0; col < Width; col++)
    {
        ret[col] = ComputeMandelbrotIndex(row, col);
    }
    return ret;
}

public override void Generate()
{
    var query = from row in Enumerable.Range(0, Height).AsParallel(ParallelQueryOptions.PreserveOrdering)
                select ComputeMandelbrotRow(row);

    int rowStart = 0;
    foreach (byte[] row in query)
    {
        Array.Copy(row, 0, Data, rowStart, Width);
        rowStart += Width;
    }
}

 

ParallelLinqRowByRowInPlace

An obvious potential improvement is to write the data to the eventual target array as we go, to avoid creating the extra array creation and copying. At this point we have less of a purely functional solution, but it’s interesting anyway. Note that to use this idea in a query expression we have to return something even though it’s not useful. Likewise we have to make the query execute fully – so I use Max() to ensure that all the results will be computed. (I originally tried Count() but that didn’t work – presumably because the count of the results is known before the actual values are.) As we’re writing the data to the right place on each iteration, we no longer need to preserve ordering.

 

private int ComputeMandelbrotRow(int row)
{
    int index = row * Width;
    for (int col = 0; col < Width; col++)
    {
        Data[index++] = ComputeMandelbrotIndex(row, col);
    }
    return 0;
}

public override void Generate()
{
    var query = from row in Enumerable.Range(0, Height).AsParallel()
                select ComputeMandelbrotRow(row);

    query.Max();
}

 

ParallelLinqWithSequenceOfPoints

After this initial foray into Parallel LINQ, Nicholas Palladinos suggested that I could adapt the original idea in a more elegant manner by treating the sequence of points as a single input sequence, and asking Parallel LINQ to preserve the order of that whole sequence.

 

public override void Generate()
{
    var points = from row in Enumerable.Range(0, Height)
                 from col in Enumerable.Range(0, Width)
                 select new { row, col };

    var query = from point in points.AsParallel(ParallelQueryOptions.PreserveOrdering)
                select ComputeMandelbrotIndex(point.row, point.col);

    int index=0;
    foreach (byte b in query)
    {
        Data[index++] = b;
    }
}

 

SingleThreadLinqWithGenerator

My next step was to try to put the whole guts of the algorithm into the query expression, using a Complex value type and a generator to create a sequence of complex numbers generated from the starting point. This is quite a natural step, as the value is computed by just considering this infinite sequence and seeing how quickly it escapes from the circle of radius 2 on the complex plane. Aside from anything else, I think it’s a nice example of deferred execution and streaming – the sequence really would go on forever if we didn’t limit it with the Take and TakeWhile calls, but the generator itself doesn’t know it’s being limited. This version uses a sequence of points as per ParallelLinqWithSequenceOfPoints to make it easier to parallelise later.

It’s not exactly an efficient way of doing things though – there’s a huge amount of calling going on from one iterator to another. I’m actually quite surprised that the final results aren’t worse than they are.

 

public override void Generate()
{
    var points = from row in Enumerable.Range(0, Height)
                 from col in Enumerable.Range(0, Width)
                 select new { row, col };

    var query = from point in points
                // Work out the initial complex value from the row and column
                let c = new Complex((point.col * SampleWidth) / Width + OffsetX,
                                    (point.row * SampleHeight) / Height + OffsetY)
                // Work out the number of iterations
                select GenerateSequence(c, x => x * x + c).TakeWhile(x => x.SquareLength < 4)
                                                          .Take(MaxIterations)
                                                          .Count() into count
                // Map that to an appropriate byte value
                select (byte)(count == MaxIterations ? 0 : (count % 255) + 1);

    int index = 0;
    foreach (byte b in query)
    {
        Data[index++] = b;
    }
}

public static IEnumerable<T> GenerateSequence<T>(T start, Func<T, T> step)
{
    T value = start;
    while (true)
    {
        yield return value;
        value = step(value);
    }
}

 

ParallelLinqWithGenerator

From the previous implementation, parallelisation is simple.

 

public override void Generate()
{
    var points = from row in Enumerable.Range(0, Height)
                 from col in Enumerable.Range(0, Width)
                 select new { row, col };

    var query = from point in points.AsParallel(ParallelQueryOptions.PreserveOrdering)
                // Work out the initial complex value from the row and column
                let c = new Complex((point.col * SampleWidth) / Width + OffsetX,
                                    (point.row * SampleHeight) / Height + OffsetY)
                // Work out the number of iterations
                select GenerateSequence(c, x => x * x + c).TakeWhile(x => x.SquareLength < 4)
                                                          .Take(MaxIterations)
                                                          .Count() into count
                // Map that to an appropriate byte value
                select (byte)(count == MaxIterations ? 0 : (count % 255) + 1);

    int index = 0;
    foreach (byte b in query)
    {
        Data[index++] = b;
    }
}

 

SingleThreadImperativeWithComplex

Having already seen how slow the generator version was in the past, I thought it would be worth checking whether or not this was due to the use of the Complex struct – so this is a version which uses that, but is otherwise just like the very first implementation (SingleThreadImperativeSimple).

 

public override void Generate()
{
    int index = 0;
    for (int row = 0; row < Height; row++)
    {
        for (int col = 0; col < Width; col++)
        {
            Data[index++] = ComputeMandelbrotIndexWithComplex(row, col);
        }
    }
}

byte ComputeMandelbrotIndexWithComplex(int row, int col)
{
    double x = (col * SampleWidth) / Width + OffsetX;
    double y = (row * SampleHeight) / Height + OffsetY;

    Complex start = new Complex(x, y);
    Complex current = start;

    for (int i = 0; i < MaxIterations; i++)
    {
        if (current.SquareLength >= 4)
        {
            return (byte)((i % 255) + 1);
        }
        current = current * current + start;
    }
    return 0;
}

 

UnorderedParallelLinqSimple

This is where my colleague, Christian, came in. He suggested that instead of ordering the results, we just return the point as well as its value. We can then populate the array directly, regardless of the order of the results. The first version just uses an anonymous type to represent the combination:

 

public override void Generate()
{
    var query = from row in Enumerable.Range(0, Height).AsParallel()
                from col in Enumerable.Range(0, Width)
                select new { row, col, value = ComputeMandelbrotIndex(row, col) };

    foreach (var x in query)
    {
        Data[x.row * Width + x.col] = x.value;
    }
}

 

UnorderedParallelLinqWithStruct

Creating a new garbage-collected object on the heap for each point isn’t the world’s greatest idea. A very simple transformation is to use a struct instead. Without knowing the PLINQ implementation, it seems likely that the values will still end up on the heap somehow (how else could they be communicated between threads?) but I’d still expect a certain saving from this simple step.

 

public override void Generate()
{
    var query = from row in Enumerable.Range(0, Height).AsParallel()
                from col in Enumerable.Range(0, Width)
                select new Result (row, col, ComputeMandelbrotIndex(row, col));

    foreach (var x in query)
    {
        Data[x.Row * Width + x.Column] = x.Value;
    }
}

struct Result
{
    internal int row, col;
    internal byte value;

    internal int Row { get { return row; } }
    internal int Column { get { return col; } }
    internal byte Value { get { return value; } }

    internal Result(int row, int col, byte value)
    {
        this.row = row;
        this.col = col;
        this.value = value;
    }
}

 

UnorderedParallelLinqInPlace

Why bother returning the data at all? We can just write the data in place, like we did earlier with ParallelLinqRowByRowInPlace but in a more aggressive fashion (and the disadvantage of computing the index for each point). Again, we have to return a dummy value and iterate through those dummy results to force all the computations to go through.

 

public override void Generate()
{

    var query = from row in Enumerable.Range(0, Height).AsParallel()
                from col in Enumerable.Range(0, Width)
                // Note side-effect!
                select Data[row*Width + col] = ComputeMandelbrotIndex(row, col);

    // Force iteration through all results
    query.Max();
}

 

UnorderedParallelLinqInPlaceWithDelegate

UnorderedParallelLinqInPlace feels slightly nasty – we’ve clearly got a side-effect within the loop, it’s just that we know the side-effects are all orthogonal. We can make ourselves feel slightly cleaner by separating the side-effect from the main algorithm, by way of a delegate. The loop can hold its hands up and say, “I’m side-effect free if you are.” It’s not hugely satisfactory, but it’ll be interesting to see the penalty we pay for this attempt to be purer.

 

IEnumerable<T> Generate<T>(Func<int, int, T> transformation)
{
    var query = from row in Enumerable.Range(0, Height).AsParallel()
                from col in Enumerable.Range(0, Width)
                // Side-effect only if transformation contains one…
                select transformation(row, col);

    return query;
}

public override void Generate()
{
    // Transformation with side-effect
    Func<int,int,byte> transformation = (row, col) => Data[row * Width + col] = ComputeMandelbrotIndex(row, col);

    Generate(transformation).Max();
}

 

UnorderedParallelLinqInPlaceWithGenerator

We can easily combine the earlier generator code with any of these new ways of processing the results. Here we use our “algorithm in the query” approach but process the results as we go to avoid ordering issues.

 

public override void Generate()
{
    var query = from row in Enumerable.Range(0, Height).AsParallel()
                from col in Enumerable.Range(0, Width)
                // Work out the initial complex value from the row and column
                let c = new Complex((col * SampleWidth) / Width + OffsetX,
                                    (row * SampleHeight) / Height + OffsetY)
                // Work out the number of iterations
                let count = GenerateSequence(c, x => x * x + c).TakeWhile(x => x.SquareLength < 4)
                                                               .Take(MaxIterations)
                                                               .Count()
                // Map that to an appropriate byte value – and write it in place
                select Data[row * Width + col]  = (byte)(count == MaxIterations ? 0 : (count % 255) + 1);

    // Force execution
    query.Max();
}

 

ParallelForLoop

The Parallel Extensions library doesn’t just contain Parallel LINQ. It also has various other building blocks for parallelism, including a parallel for loop. This allows very simple parallelism of our very first generator – we just turn the outside loop into a parallel for loop, turning the previous inner loop into a delegate. We need to move the index variable into the outer loop so there’ll be one per row (otherwise they’d trample on each other) but that’s about it:

 

public override void Generate()
{
    Parallel.For(0, Height, row =>
    {
        int index = row * Width;
        for (int col = 0; col < Width; col++)
        {
            Data[index++] = ComputeMandelbrotIndex(row, col);
        }
    });
}

 

Results

A benchmark is nothing without results, so here they are on my dual core laptop, from three test runs. The first is the “default” settings I used to develop the benchmark – nothing hugely strenuous, but enough to see differences. I then tried a larger image with the same maximum number of iterations, then the original size with a larger number of iterations. The results are in alphabetical order because that’s how the test prints them :) Times are in milliseconds.

Implementation Width=1200; MaxIterations=200 Width=3000; MaxIterations=200 Width=1200; MaxIterations=800
MultiThreadRowFetching 380 2479 1311
MultiThreadUpFrontSplitImperative 384 2545 2088
ParallelForLoop 376 2361 1292
ParallelLinqRowByRowInPlace 378 2347 1295
ParallelLinqRowByRowWithCopy 382 2376 1288
ParallelLinqWithGenerator 4782 29752 16626
ParallelLinqWithSequenceOfPoints 549 3413 1462
SingleThreadImperativeInline 684 4352 2455
SingleThreadImperativeSimple 704 4353 2372
SingleThreadImperativeWithComplex 2795 16720 9800
SingleThreadLinqSimple 726 4522 2438
SingleThreadLinqWithGenerator 8902 52586 30075
UnorderedParalleLinqInPlace 422 2586 1317
UnorderedParallelLinqInPlaceWithDelegate 509 3093 1392
UnorderedParallelLinqInPlaceWithGenerator 5046 31657 17026
UnorderedParallelLinqSimple 556 3449 1448
UnorderedParalelLinqWithStruct 511 3227 1427

Conclusions

So, what have we learned? Well… bearing in mind that benchmarks like this are often misleading compared with real applications, etc it’s still interesting that:

  • Parallel Extensions rocks. If I were trying to include a Mandelbrot generation implementation in a production setting, I’d definitely go for the parallel for loop – it’s simple, but works just as well as anything else.
  • The micro-optimisation of SingleThreadImperativeInline really doesn’t help much, but makes the code harder to understand – just like so many micro-optimisations.
  • The “generator” form of LINQ really doesn’t perform well at all. It does parallelise pretty well, as you’d expect, but it’s just plain slow.
  • Part of the slowness is almost certainly due to the use of the Complex struct, given the results of SingleThreadImperativeWithComplex. Not sure what’s going on there.
  • The extra abstraction step from UnorderedParallelLinqInPlace to UnorderedParallelLinqInPlaceWithDelegate does have a significant impact, at least when the maximum number of iterations per point isn’t the dominant force. That doesn’t mean it would be a bad idea in production, of course – just somewhere to consider when running against performance issues.

I suspect I’ve missed some notable points though – comments?

The Beauty of Closures

Fairly soon I’m going to write a blog post comparing the different proposals under consideration for Java 7 when it comes to closures. I thought it would be worth writing some background material on it first though, so I’ve put an article on the C# in Depth site.

I’m not entirely comfortable that I’ve captured why they’re important, but the article can mature over time like a good wine. The idea is to counter the Blub Paradox (which I hadn’t come across before, but I agree with completely – it’s part of the effect I believe Steve McConnell was fighting against when talking about programming in a language).

Programming “in” a language vs programming “into” a language

I’m currently reading Steve McConnell’s Code Complete (for the first time – yes, I know that’s somewhat worrying) and there was one section was disturbed me a little. For those of you with a copy to hand, it’s in section 4.3, discussing the difference between programming in a language and programming into a language:

Programmers who program “in” a language limit their thoughts to constructs that the language directly supports. If the language tools are primitive, the programmer’s thoughts will also be primitive.

Programmers who program “into” a language first decide what thoughts they want to express, and then they determine how to express those thoughts using the tools provided by their specific language.

Now don’t get me wrong – I can see where he’s coming from, and the example he then provides (Visual Basic – keeping the forms simple and separating them from business logic) is fine, but he only seems to give one side of the coin. Here’s a different – and equally one-sided – way of expressing the same terms:

Programmers who program “in” a language understand that language’s conventions and idioms. They write code which integrates well with other libraries, and which can be easily understood and maintained by other developers who are familiar with the language. They benefit from tools which have been specifically designed to aid coding in the supported idioms.

Programmers who program “into” a language will use the same ideas regardless of their target language. If their style does not mesh well with the language, they will find themselves fighting against it every step of the way. It will be harder to find libraries supporting their way of working, and tools may well prove annoying. Other developers who come onto the project later and who have experience in the language but not the codebase will find it hard to navigate and may well accidentally break the code when changing it.

There is a happy medium to be achieved, clearly. You certainly shouldn’t restrict your thinking to techniques which are entirely idiomatic, but if you find yourself wanting to code in a radically different style to that encouraged by the language, consider changing language if possible!

If I were attacking the same problem in C# 1 and C# 3, I could easily end up with radically different solutions. Some data extraction using LINQ in a fairly functional way in C# 3 would probably be better solved in C# 1 by losing some of the functional goodness than by trying to back-port LINQ and then use it without the benefit of lambda expressions or even anonymous methods.

Accents and Conventions

That’s just between different versions of the same language. Between different actual languages, it can get much worse. If you’ve ever seen Java code written in a C++ style or vice versa, you’ll know what I mean. I’ve previously referred to this in terms of speaking a language with an accent – you can speak C# with a Java accent just as you can speak French with an English accent. Neither is pleasant.

At the lowest level, this is likely to be about conventions – and I’m pretty sure that when Steve writes “Invent your own coding conventions, standards, class libraries, and other augmentations” he doesn’t actually mean us to do it in a gratuitous fashion. It can be worth deviating from the “platform favoured” conventions sometimes, particularly if those differences are invisible to clients, but it should always be done with careful consideration. In a Java project I worked on a few years ago, we took the .NET naming conventions for interfaces (an I prefix) and constants (CamelCasing instead of SHOUTY_CAPS). Both of these made the codebase feel slightly odd, particularly where Java constants were used near our constants – but I personally found the benefits to be worth the costs. Importantly, the whole team discussed it before making any decisions.

Design Patterns

At a slightly higher level, many design patterns are just supported much, much better by some languages than others. The iterator pattern is a classic example. Compare the support for it from Java 6 and C# 2. On the “client” side, both languages have specific syntax: the enhanced for loop in Java and the foreach loop in C#. However, there is one important difference: if the iterator returned by GetEnumerator implements IDisposable (which the generic form demands, in fact) C# will call Dispose at the end of the loop, no matter how that occurs (reaching the end of the sequence, breaking early, an exception being thrown, etc). Java has no equivalent of this. Imagine that you want to write a class to iterate over the lines in a file. In Java, there’s just no safe way of representing it: you can make your iterator implement Closeable but then callers can’t (safely) use the enhanced for loop. You can make your code close the file handle when it reaches the end, but there’s no guarantee that will happen.

Then consider the “server” side of the iterator – the code actually providing the data. Java is like C# 1 – there’s no specific support for implementing an iterator. In C# 2 and above, iterator blocks (i.e. methods with yield statements) make life much, much easier. Writing iterators by hand can be a real pain. Reading a file line by line isn’t too bad, leaving aside the resource lifetime issue – but the complexity can balloon very quickly. Off by one errors are really easy to introduce.

So, if I were tackling a project which required reading text files line by line in various places, what would I do? In Java, I would take the reasonably small hit of a while loop in each place I needed it. In C# I’d write a LineReader class (if I didn’t already have one!) and use a more readable foreach loop. The contortions involved in introducing that idea into Java just wouldn’t be worth the effort.

At a much higher level, we get into whole programming styles and paradigms. If your natural inclination is to write imperative code, you’re likely to create a mess (or get very frustrated) in a functional language. If the problem really does call for a functional language, find someone else to help you think in a more functional way. If the problem suits imperative programming just as well as it does functional programming, see if you can change the environment to something more familiar.

Conclusion

I’m not suggesting that Steve’s point isn’t valid – but he’s done his readers a disservice by only presenting one side of the matter. Fortunately, the rest of the book (so far) is excellent and humbling – to such a degree that this minor quibble stuck out like a sore thumb. In a book which had more problems, I would probably barely have even noticed this one.

There’s another possibility, of course – I could be competely wrong; maybe I’ve been approaching problems from a restrictive viewpoint all this time. How about you?

We’ve shipped! C# in Depth is out now (ebook)

I’m hugely pleased to announce that C# in Depth is now available in finished form as an ebook. The hard copy will ship in about three weeks. Thanks to everyone who’s been involved, particularly the folks from Manning, Eric Lippert (for both the tech review and the very kind comments!) and all the peer reviewers. Oh, and Holly for putting up with my lack of spare time over the last year :)

The work isn’t over yet, of course… I’ve still got to write up the specification map on the book’s web site, and I’ll probably end up writing various articles for magazines etc for marketing purposes. Still, a very significant milestone!

I really hope to write another book at some point – but I think I’ll be taking a few months off first…