Manning book draw

I was recently contacted about a cool way of possibly getting a free .NET ebook (legally!) or if you’re really lucky, the complete Manning .NET library. Basically click on the advert below to enter: one person will be drawn each day until July 17th, and the final prize will be the complete .NET library from Manning. Have at it. If it happens to draw more attention to C# in Depth at the same time, that’s fine by me…

Blog update

Apologies for the recent blog outage. The msmvps.com server had some issues, and has been updated with recent patches etc at the same time.

There’s one downside with this – my blog URL included "jon.skeet" and the server now doesn’t like blog names with dots in, so links to old articles are likely to be broken I’m afraid. Susan Bradley has been wonderful at looking at this issue for me – many thanks to her – and we’ll see if we can end up with a slightly more useful solution. (Susan has already redirected http://msmvps.com/blogs/jon.skeet to http://msmvps.com/blogs/jon_skeet which should help a lot.)

I’ve updated my feedburner URL, so hopefully at least many readers will now be sorted. Please let me know if you run into any issues – anything I can deal with, I will. (I’ll be updating links within my own web sites and this blog of course – but it could take a little while to catch all of them.)

Automatic lambda expressions

This morning I happened to show a colleague (Malcolm Rowe) the neat trick of using nullable types and the null-coalescing operator (??) to implement compound comparisons in C#. He asked whether it wouldn’t have been nicer to make this a library feature rather than a language feature. I’m all for putting features into libraries where possible, but there’s a problem in this case: the ?? operator doesn’t evaluate its right operand unless the left operand evaluates to null. This can’t be replicated in a library. Or can it?

The obvious way to lazily evaluate an expression is to turn it into a closure. So, we can write out coalescing method as:

 

public static T CoalesceNulls<T>(T lhs, Func<T> rhs)
{
    return lhs != null ? lhs : rhs();
}

 

That’s quite an efficient way of doing it, but the assymetry isn’t ideal. We can fix that by making the first argument a function too:

 

public static T CoalesceNulls<T>(Func<T> lhs, Func<T> rhs)
{
    T first = lhs();
    return first != null ? first : rhs();
}

 

One of the nice things you can do with the null-coalescing operator is use it for multiple expressions, e.g. a ?? b ?? c ?? d which will evaluate a, then (if it’s null) evaluate b, then (if that’s null) evaluate c etc. With the symmetry present we can now make this into a parameter array:

 

public static T CoalesceNulls<T>(params Func<T>[] functions)
{
    T current = default(T);
    foreach (Func<T> func in functions)
    {
        current = func();
        if (current != null)
        {
            break;
        }
    }
    return current;
}

 

(In some ways it’s still more elegant to specify a bare T as the first parameter, as that ensures that at least you’ve got one function to call. The change required is pretty obvious.)

Now there are only two problems: invoking the method, and the performance characteristics. I’m going to ignore the latter – yes, you could end up with a significant performance hit creating all these closures all the time if you use this in performance critical code. There’s not a lot of options available there, really. But what about the syntax for invoking the method?

In its current form, we can write the current a ?? b ?? c ?? d as CoalesceNulls(() => a, () => b, () => c, () => d)l. That’s a wee bit ugly. Malcolm blogged about an alternative where the parameters could be declared on the declaration side) as calling for deferred execution, and automatically converted into closures by the compiler.

Just like Malcolm, I’m not terribly keen on this – for instance, I really like the fact that in C# it’s always very clear when a parameter is being passed by reference, because there’s the ref keyword on the caller side as well as the declaration side. Going against this would just feel wrong.

If we have to change the language and introduce new keywords or symbols, we’re then back where we started – the whole idea was to avoid having to introduce ?? into the language. The plus side is that it could be “usage agnostic” – just because it could be used for a null-coalescing operator replacement doesn’t mean it would have to be. However, this feels like using a sledgehammer to crack a nut – the number of times this would be useful is pretty minimal. The same argument could be applied to the null-coalescing operator itself, of course, but I think its utility – and the way it fits into the language pretty seamlessly – justifies the decisions made here.

Still, it was an interesting thought experiment…

Update: Since originally writing this (a few days ago, before the blog outage, Malcolm has been informed that Scala has precisely this capability. Just another reason for getting round to learning Scala at some point…

I’m Sorry, I Haven’t a Cluestick

Unless you’ve listened to I’m Sorry, I Haven’t a Clue, the BBC Radio 4 “antidote to panel games” which was chaired by the recently departed (and much missed) Humphrey Lyttelton, this post may not make a lot of sense to you. That’s not intended to guarantee that it’ll make any sense to you even if you have listened to it, mind you.

ISIHAC was full of very silly improvisation games. I was listening to an episode last night, and thought a developer version could be equally silly. Games played might include the following: (links show the Wikipedia description of the original)

  • One language to the API of another: Not so much a joke as a strategy now at both Microsoft and Sun, but I’m sure it could be taken to extremes. Ever fancied using the ZX Spectrum BASIC functions from C#? How about SmallTalk from Java?
  • Singleton Crescent: Players name design patterns, seemingly at random to the unenlightened observer, until one of them declares “Singleton”. Multiple variations exist, such as Gamma’s Folly: “After a diagonal move ending on a creational pattern, a behavioural pattern is not permitted.”
  • Fluent Gorge: Players start with a single method call and add additional expressions. The player who inadvertently completes the statement is the loser. This game is already played at lunchtime in many ISVs, some of whom accidentally check in the tortuous code afterwards.
  • Pick-up algorithm: A player begins copying code out of a book, which is then removed after a few lines. They then have to continue coding as accurately as possible until the book is brought back. Points are awarded if they’re within a gnat’s semi-colon of the original. Bonus points used to be awarded if the resulting code had fewer bugs in than the printed original, but this practice has recently been discontinued as being a near-trivial way of achieving a high score.
  • Call my bluff: One player is given a single-line project goal, and then several sets of requirements purporting to support that goal. The player has to guess which set of requirements is the real one for the project. The twist is that none of the sets of requirements is even slightly realistic. Again, this game is in widespread use, played by Business Analysts on unsuspecting developers.

Right, that’s more than enough silliness. Something more serious next time, I promise.

Guest post: Joe Albahari reviews C# in Depth

Joe Albahari, co-author of the excellent C# 3.0 in a
Nutshell
(previously reviewed here) kindly agreed to review C# in Depth. Not only has he provided the review below,
but he also supplied several pages of notes made while he was reading it. Many of those
notes have been incorporated into the C# in Depth notes page – it’s always good to include thoughtful feedback. (And I always welcome more, hint hint.)

Without further ado, here’s Joe’s review.

C# in Depth: Review

After having been invited to
review this book by two people at Manning—as well as Jon himself—I
figure it’s about time I came forward! Bear in mind that I’m not
a typical reader: I’m an author, and this makes me more critical than
most. This is especially true given that I wrote C# 3.0 in a Nutshell
with a coauthor (imagine two people constantly searching for ways to
improve each others’ work!). So I will do my best to compensate and
strive to be fair. Please post a comment if you feel I’ve missed the
mark!

Scope

While most other C# books cover
the language, the CLR and at least some aspects of the Framework, C#
in Depth concentrates largely on just the language. You won’t find discussions
on memory management, assemblies, streams and I/O, security, threading,
or any of the big APIs like WPF or ASP.NET. This is good in that doesn’t
duplicate the books already out there, as well as giving more space
for the language.

You might expect that a book
focusing on the C# language itself would cover all of it. But interestingly,
the book covers only about a quarter of the C# language, namely the
features new to C# 2 and C# 3. This sets its target audience: programmers
who already know C# 1, but have not yet switched to C# 2 and 3. This
creates a tight focus, allowing it to devote serious space to topics
such as generics, nullable types, iterators and lambda expressions.
It’s no exaggeration to say that this book covers less than one tenth
of the ground of most other C# books, but gives that ground ten times
as much attention.

Organization and Style

The book is divided into three
parts:

  • Preliminaries (delegates
    and the type system)
  • Features new to
    C# 2.0
  • Features new to
    C# 3.0

I like this organization: it
presents topics in an order roughly similar to how I teach when giving
tutorials on LINQ—starting with the foundations of delegates and generics,
before moving on to iterators and higher-order functions, and then finally
LINQ. Sometimes the routes are a little circuitous and involve some
huffing and puffing, but the journey is worthwhile and helps to solidify
concepts.

C# in Depth is a tutorial that
gradually builds one concept upon another and is designed primarily
for sequential reading. The examples don’t drag on over multiple sections,
however, so you can jump in at any point (assuming you understand the
preceding topics). The examples are all fairly short, too, which is
very much in my taste. In fact, I would say Jon and I think very much
alike: when he expresses an opinion, I nearly always agree wholeheartedly.

A big trap in writing tutorials,
is assuming knowledge of topics that you teach later. This book rarely
falls victim to this. The writer is also consistent in his use of terminology—and
sticks with the C# Language Specification which I think sets a good
example to all authors. Jon is not sloppy with concepts and is careful
in his wording to avoid misinterpretation. One thing that comes through
is that Jon really understands the material deeply himself.

If I were to classify this
book as beginner/intermediate/advanced, I’d say intermediate-to-advanced.
It’s quite a bit more advanced than, say, Jesse’s tutorial “Programming
C#”.

The layout of the book is pleasing—I
particularly like the annotations alongside the code listings.

Content

In the first section, “Preparing
for the Journey,” the book does cover a few C# 1 topics, namely delegates
and C#’s type system. Jon’s handling of these topics is excellent: his
discussion of static, explicit and safe typing is clear and helpful,
as is the section on value types versus reference types. I particularly
liked the section “Dispelling Myths”—this is likely to be
of use even to experienced developers. This chapter, in fact, leaves
the reader pining for more advanced C# 1 material.

The C# 2 features are very
well covered. The section on generics includes such topics as their
handling by the JIT compiler, the subtleties of type inference, a thorough
discussion on constraints, covariance/contravariance limitations, and
comparisons with Java’s generics and C++’s templates. Nullable types
are covered similarly well, with suggested patterns of use, as are anonymous
methods and iterators.

The C# 3 features are also
handled well. I like how Jon introduces expression trees—first building
them programmatically, and then showing how the compiler provides a
shortcut via lambda expressions. The book covers query expressions and
the basics of LINQ, and includes a brief explanation of each of the
standard query operators in an appendix. There’s also a chapter called
“LINQ Beyond Collections” which briefly introduces the LINQ to SQL,
LINQ to DataSet and LINQ to XML APIs.

Throughout the book, Jon goes
to some lengths to explain not just “what”, but “why”. This
book isn’t for people who want to get in and out quick so they can
get their job done and out of the way—it’s for people who enjoy
working elegantly with their tools, through a rich understanding of
the language’s background, subtleties and nuances.

Of course, digesting all this
is a bit of work (Chapter 3’s summary opens with the word “Phew!”).
Despite this, I think Jon does a good job at explaining difficult things
well. I don’t think I’ve seen any IL listings in the book, which is
a good sign in general. I’m always wary when an author, in explaining
a C# concept, says, “to understand XYZ, we must examine the IL”.
I take issue with this: rarely, if ever, does one need to look at IL
to understand C#, and doing so creates unnecessary complication by choosing
the wrong level of abstraction. That isn’t to saying looking at IL isn’t
useful for a deeper understanding of the CLR—but only after first
teaching C# concepts independently of IL.

What’s Missing

It was in Jon’s design critieria
not build a tome—instead to write a small(ish) book that complements
rather than replaces books such as C# 3.0 in a Nutshell. Most things
missing from C# in Depth are consistent with its focus (such as the
CLR, threading, .NET Framework, etc.) The fact that C# in Depth excludes
the features of C# that were introduced prior to version 2 is a good
thing if you’re looking for a “delta” book, although, of course,
it makes it less useful as a language reference.

The book’s treatment of LINQ
centres largely on LINQ to Objects. If you’re planning on learning
C# 3.0 so that you can query databases through LINQ, the book’s focus
is not ideal, if read in isolation. I personally prefer the approach
of covering “remote” query architecture earlier and in more detail
(in conjunction with the canonical API, LINQ to SQL) – so that when
it comes time to teach query operators such as SelectMany, Group and
Join, they can be demonstrated in the context of both local and database
queries. I also strive, when writing on LINQ, to cover enough querying
ground that readers can “reproduce” their SQL queries in LINQ—even
though it means having to get sidetracked with API practicalities. Of
course, getting sidetracked with API practicalities is undesirable for
a language-centric book such as C# in Depth, and so the LINQ to Objects
focus is understandable. In any case, reading Chapters 8-10 of C# 3.0
in a Nutshell would certainly fill in the gaps. Another complementary
book would be Manning’s LINQ in Action (this book is well-reviewed
on Amazon, though I’ve not yet read it).

Summary

This book is well written,
accurate and insightful, and complements nearly every other book out
there. I would recommend it to anyone wanting a thorough “inside”
tutorial on the features new to C# 2 and 3.

Parallel Extensions June CTP

Either my timing is great or it’s lousy – you decide. Yesterday I posted about parallelising Conway’s Life – and today the new CTP for the Parallel Extensions library comes out! The bad news is that it meant I had to run all the tests again… the good news is that it means we can see whether or not the team’s work over the last 6 months has paid off.

Breaking change

The only breaking change I’ve seen is that AsParallel() no longer takes a ParallelQueryOptions parameter – instead, you call AsOrdered() on the value returned from AsParallel(). It was an easy change to make, and worked first time. There may well be plenty of other breaking API changes which are more significant, of course – I’m hardly using any of it.

Benchmark comparisons

One of the nice things about having the previous blog entries is that I can easily compare the results of how things were then with how they are now. Here are the test results for the areas of the previous blog posts which used Parallel Extensions. For the Game of Life, I haven’t included the results with the rendering for the fast version, as they’re still bound by rendering (unsurprisingly).

Mandelbrot set

(Original post)

Results are in milliseconds taken to plot the whole image, so less is better. (x;y) values mean “Width=x, MaxIterations=y.”

Description December CTP June CTP
ParallelForLoop (1200;200) 376 380
ParallelForLoop (3000;200) 2361 2394
ParallelForLoop (1200;800) 1292 1297
ParallelLinqRowByRowInPlace (1200;200) 378 393
ParallelLinqRowByRowInPlace (3000;200) 2347 2440
ParallelLinqRowByRowInPlace (1200;800) 1295 1939
ParallelLinqRowByRowWithCopy (1200;200) 382 411
ParallelLinqRowByRowWithCopy (3000;200) 2376 2484
ParallelLinqRowByRowWithCopy (1200;800) 1288 1401
ParallelLinqWithGenerator (1200;200) 4782 4868
ParallelLinqWithGenerator (3000;200) 29752 31366
ParallelLinqWithGenerator (1200;800) 16626 16855
ParallelLinqWithSequenceOfPoints (1200;200) 549 533
ParallelLinqWithSequenceOfPoints (3000;200) 3413 3290
ParallelLinqWithSequenceOfPoints (1200;800) 1462 1460
UnorderedParalleLinqInPlace (1200;200) 422 440
UnorderedParalleLinqInPlace (3000;200) 2586 2775
UnorderedParalleLinqInPlace (1200;800) 1317 1475
UnorderedParallelLinqInPlaceWithDelegate (1200;200) 509 514
UnorderedParallelLinqInPlaceWithDelegate (3000;200) 3093 3134
UnorderedParallelLinqInPlaceWithDelegate (1200;800) 1392 1571
UnorderedParallelLinqInPlaceWithGenerator (1200;200) 5046 5511
UnorderedParallelLinqInPlaceWithGenerator (3000;200) 31657 30258
UnorderedParallelLinqInPlaceWithGenerator (1200;800) 17026 19517
UnorderedParallelLinqSimple (1200;200) 556 595
UnorderedParallelLinqSimple (3000;200) 3449 3700
UnorderedParallelLinqSimple (1200;800) 1448 1506
UnorderedParalelLinqWithStruct (1200;200) 511 534
UnorderedParalelLinqWithStruct (3000;200) 3227 3154
UnorderedParalelLinqWithStruct (1200;800) 1427 1445

A mixed bag, but overall it looks to me like the June CTP was slightly worse than the older one. Of course, that’s assuming that everything else on my computer is the same as it was a couple of weeks ago, etc. I’m not going to claim it’s a perfect benchmark by any means. Anyway, can it do any better with the Game of Life?

Game of Life

(Original post)

Results are in frames per second, so more is better.

Description December CTP June CTP
ParallelFastInteriorBoard (rendered) 23 22
ParallelFastInteriorBoard (unrendered) 29 28
ParallelFastInteriorBoard 508 592

Yes, ParallelFastInteriorBoard really did get that much speed bump, apparently. I have no idea why… which leads me to the slightly disturbing conclusion of this post:

Conclusion

The numbers above don’t mean much. At least, they’re not particularly useful because I don’t understand them. Why would a few tests become “slightly significantly worse” and one particular test get markedly better? Do I have serious benchmarking issues in terms of my test rig? (I’m sure it could be a lot “cleaner” – I didn’t think it would make very much difference.)

I’ve always been aware that off-the-cuff benchmarking like this was of slightly dubious value, at least in terms of the details. The only facts which are really useful here are the big jumps due to algorithm etc. The rest is noise which may interest Joe and the team (and hey, if it proves to be useful, that’s great) but which is beyond my ability to reason about. Modern machines are complicated beasts, with complicated operating systems and complicated platforms above them.

So ignore the numbers. Maybe the new CTP is faster for your app, maybe it’s not. It is important to know it’s out there, and that if you’re interested in parallelisation but haven’t played with it yet, this is a good time to pick it up.

More parallelisation fun: Conway’s Game of Life

Okay, so I’ve probably exhausted the Mandelbrot set as a source of interest, at least for the moment. However, every so often someone mentions something or other which sounds like a fun exercise in parallelisation. The latest one is Conway’s Game of Life. Like the Mandelbrot set, this is something I used to play with when I was a teenager – and like the Mandelbrot set, it’s an embarrassingly parallel problem.

As before, I’ve written a few implementations but not worked excessively hard to optimise. All my tests were performed with a game board of 1000×500 cells, displaying at one pixel per cell (partly because that was the easiest way to draw it). I found that with the slower implementations the rendering side was irrelevant to the results, but the rendering became a bottleneck with the fast algorithms, so I’ve included results both with and without rendering.

The “benchmark” code (I use the word very lightly – normal caveats around methodology apply, this was only a weekend evenings project after all) records the “current” number of frames per second roughly once per second, and also an average over the whole run. (The fast algorithm jump around in fps pretty wildly depending on what’s going on – the more naïve ones don’t really change much between a busy board and a dead one.) The starting pattern in all cases was the R-pentomino (or F-pentomino, depending on whose terminology you use) in the middle of the board.

A base class (BytePerPixelBoard) is used by all the implementations – this represents the board with a single byte per pixel, which is really easy to work with but obviously very wasteful in terms of memory (and therefore cache). I haven’t investigated any other representations, but I wanted to get this post out fairly quickly as I have a lot of other stuff to do at the moment. (I also have a post on Java closures which I should tidy up soon, but hey…)

Implementation smells

All the source code is available for download, of course. There are no fancy options for verifying the results or even turning off rendering – just comment out the line in Program.BackgroundLoop which calls Render.

It would be fairly easy to make ParallelFastInteriorBoard derive from SimpleFastInteriorBoard, and ParallelActiveBlockBoard derive from ActiveBlockBoard, but I’ve kept the implementations completely separate for the moment. That means a huge amount of duplication, of course, including a nested class in each of the “active block” implementations – the nested classes are exactly the same. Oh, and they use internal fields instead of properties. I wanted the fields to be read-only, so I couldn’t use automatic properties – but I didn’t want to go to the hassle of having explicitly declared fields and separate properties. Trust me, I wouldn’t do this in production code. (There were some other internal fields, but they’ve mercifully been converted into properties now.)

SimpleBoard

This is about as simple as you can get. It fetches each cell’s current value “carefully” (i.e. coping with wrapping) regardless of where it is on the board. It works, but it’s almost painfully slow.

SimpleFastInteriorBoard

This is just an optimisation of SimpleBoard. We fetch the borders of the board “slowly” as per SimpleBoard, but once we’re on the interior of the board (any cell that’s not on an edge) we know we won’t need to worry about wrapping, so we can just use a fixed set of array offsets relative to the offset of the cell that’s being calculated.

This ends up being significantly faster, although it could still be optimised further. In the current code each cell is tested for whether it’s an interior cell or not. There’s a slight optimisation by remembering the results for “am I looking at the top or the bottom row” for a whole row, but by calculating just the edges and then the interior (or vice versa) a lot of tests could be avoided.

ParallelFastInteriorBoard

Behold the awesome power of Parallel.For. Parallelising SimpleFastInteriorBoard literally took about a minute. I haven’t seen any toolkit other rival this in terms of ease of use. Admittedly I haven’t done much embarrassingly parallel work in many other toolkits, but even so it’s impressive. It’ll be interesting to see how easy it is to use for complex problems as well as simple ones.

If you look at the results, you’ll see this is the first point at which there’s a difference between rendering and not. As the speed of the calculation improves the constant time taken for rendering becomes more significant.

ActiveBlockBoard

This is a departure in terms of implementation. We still use the same backing store (one big array of a byte per cell) but the board is logically broken up into blocks of 25×25 cells. (The size is hard-coded, but only as two constants – they can easily be changed. I haven’t investigated the effect of different block sizes thoroughly, but a few ad hoc tests suggests this is a reasonable sweet spot.) When considering a block in generation n, the changes (if any) in generation n-1 are considered. If the block itself changed, it may change again. If the relevant borders of any of its neighbours changed (including corners of diagonally neighbouring blocks), the cell may change. No other changes in the board can affect the block, so if none of these conditions are met the contents of generation n-1 can be copied to generation n for that block. This is obviously a massive saving when there are large areas of stable board, either dead or with stable patterns (such as 2×2 live blocks). It doesn’t help with blinkers (3×1 blocks which oscillate between vertical and horizontal alignments) or anything similar, but it’s still a big optimisation. It does mean a bit more house-keeping in terms of remembering what changed (anywhere in the cell, the four sides, and the four corners) each generation, but that pales into insignificance when you can remove the work from a lot of blocks.

Again, my implementation is relatively simple – but it still took a lot of work to get right! The code is much more complicated than SimpleFastInteriorBoard, and indeed pretty much all the code from SFIB is used in ActiveBlockBoard. Again, I’m sure there are optimisations that could be made to it, and alternative strategies for implementing it. For instance, I did toy with some code which didn’t keep track of what had changed in the block, but instead pushed changes to relevant neighbouring blocks, explicitly saying, “You need to recalculate next time!” There wasn’t much difference in speed, however, and it needed an extra setup stage in order to make the code understandable, so I’ve removed it.

The performance improvement of this is staggering – it makes it over 20 times faster than the single-threaded SimpleFastInteriorBoard. At first the improvement was much smaller – until I realised that actually I’d moved the bottleneck to the rendering, and re-benchmarked with rendering turned off.

ParallelActiveBlockBoard

Exactly the same implementation as ActiveBlockBoard, but using a Parallel.ForEach loop. Again, blissfully easy to implement, and very effective. It didn’t have the same near-doubling effect of SimpleFastInteriorBoard to ParallelFastInteriorBoard (with rendering turned off) but I suspect this is because the amount of housekeeping work required to parallelise things starts becoming more important at the rates shown in the results. It’s still impressive stuff though.

Results

Implementation FPS with rendering FPS without rendering
SimpleBoard 4 4
SimpleFastInteriorBoard 15 15
ParallelFastInteriorBoard 23 29
ActiveBlockBoard 78 326
ParallelActiveBlockBoard 72 508

Conclusions

  • The Parallel Extensions library rocks my world. Massive, massive kudos to Joe Duffy and the team.
  • I have no idea why ParallelActiveBlockBoard is slower than ActiveBlockBoard when rendering is enabled. This was repeatable (with slightly different results each time, but the same trend).
  • Always be aware that reducing one bottleneck may mean something else becomes your next bottleneck – at first I thought that ActiveBlockBoard was “only” 5 times faster than SimpleFastInteriorBoard; it was only when parallelisation showed no improvement (just the opposite) that I started turning rendering off.
  • Changing the approach can have more pronounced effects than adding cores – moving to the “active block” model gave a massive improvement over brute force.
  • Micro-optimisation has its place, when highly targeted and backed up with data – arguably the “fast interior” is a micro-optimisation, but again the improvement is quite dramatic.
  • Benchmarks are fun, but shouldn’t be taken to be indicative of anything else.
  • Did I mention that Parallel Extensions rocks?