Boxing day

Here in the UK (and possibly elsewhere – I’m too lazy to check Wikipedia) December 26th is called Boxing Day. I want to know why only this one aspect of .NET is given its own public holiday. Expanding the opportunities to language features as well as those of the runtime, I’d like to see:

  • Null coalescing day (“Take 3 nulls into the shower? Not me. I just coalesce and go!”)
  • Type inference day (not popular with HR)
  • Garbage collection day (as a proper holiday, not just when the bins are collected)
  • Just-In-Time day (arguably every day for procrastinators such as myself)
  • Lambda expression day (get those lambdas out of the closet)

I could go on, but it’s a pretty thin joke and dinner is ready.

C# in Depth: All chapters available in MEAP!

Rather excitingly, all the chapters of C# in Depth are now available for early access. The following chapters have recently been added:

10: Extension methods

Without extension methods, LINQ just couldn’t work in an elegant form. Extension methods are basically a way of faking instance methods by providing static methods with an implicit “this” parameter. Importantly, they can work on interfaces, which means you can make an interface appear to have many more methods than implementations actually have to provide. Although they’re primarily provided for the sake of LINQ, extension methods can improve code readability in other spheres too – when used cautiously. In this chapter, we look at extension methods in the non-LINQ world, and get our first glance at some of the methods in System.Linq.Enumerable.

11: Query expressions and LINQ to Objects

If you ask someone who has just read a page or two about C# 3 what the new features are, they’ll almost certainly write a query expression. This is the “from x in y where z select foo” type of expression which looks almost like SQL but isn’t. The amazing thing about query expressions is how little they impact the rest of the language: they are pure syntactic sugar, being effectively translated into other source code before being compiled. That allows some really neat tricks, and is the basis for how LINQ handles multiple data sources.

In this chapter we look at query expressions and the standard query operators which support them, via LINQ to Objects. This is “in-process” LINQ, often used with in-memory collections but more generally available for anything implementing IEnumerable or IEnumerable<T>.

12: LINQ beyond collections

We’ve all seen LINQ to SQL demos, and gone “ooh” and “ahh” (or possibly “I could do that in Rails too, y’know”) at the appropriate time. In this chapter I emphatically don’t try to teach you LINQ to SQL, but instead take you on a whistle-stop tour of lots of different LINQ providers:

  • LINQ to SQL
  • LINQ to DataSet
  • LINQ to XML
  • LINQ to NHibernate
  • LINQ to ActiveDirectory
  • LINQ to Entities
  • Parallel LINQ

I also give a bit of insight into how “true” LINQ providers like LINQ to SQL (I don’t really count LINQ to XML or LINQ to DataSet – they’re just providing IEnumerable<T>s for LINQ to Objects to work with) work, using IQueryable.

As you can tell from the scope of the chapter, I don’t try to go into many details – just enough to give the flavour, and hopefully show the “big picture”. I believe Microsoft is really trying something very ambitious with a mostly-unified query framework, and with any luck this chapter leaves that as the lasting impression.

13: Elegant code in the new era

I’ve taken a look at a lot of my technical books to see how they end – and really they don’t, properly. The last line of the last chapter could often have been set anywhere else. My final chapter is very short, but tries to give an impression of where I think software development is going, particularly in terms of C#.

Appendix A: LINQ standard query operators

Although some of the standard query operators are covered in chapter 11, there are plenty which aren’t. This appendix is really just a grouped list of the operators with some brief examples of what they do. Handy as a reference guide – one reviewer apparently said to another, “Holy crap! I want this on my wall!”

 

What next?

So, now that everything is available in MEAP, it’s all done, right? Well, not quite. I’m currently indexing and putting together final revisions – where the word “final” is pretty loose. It will then be passed to my technical reviewer (whose name I shouldn’t reveal just yet, but who I’m proud to have on board – even if I’m dreading the errors they’ll find) and the copy editor, who I believe will work effectively in parallel. After that (and final approval) it will go into production, then to press. The due date is still late March or April at the moment, I believe.

My current indexing/revising task is a real slog (which is why I’m taking a break by writing this blog entry) but I think it’s the last big push – it should get easier when I’m done on this bit. Right, back to chapter 5…

Covariance and void return types

Reading through chapter 2 (see, I’m being good) a new thought about return type covariance occurred to me. This is odd in itself, because I thought I’d exhausted my own supply of ideas around variance (which isn’t the same as knowing everything about it, of course – it just means I don’t expect to have anything new to say).

Just as a reminder, in C# 1 delegates were completely invariant – in order to create a delegate instance from a method group, for instance, the signature had to match exactly. For instance, suppose we had a delegate type declared as:

delegate object ObjectFactory();

We couldn’t (in C# 1) use the following method to create an instance of ObjectFactory:

public StringBuilder CreateStringBuilder()
{
    return new StringBuilder();
}

Even though a StringBuilder is always an object, the signatures don’t match exactly, and it was prohibited.

C# 2 allows this, however, so it’s legal to write:

ObjectFactory factory = CreateStringBuilder;

So far, so good. (Yes, a generic delegate type would be nicer, but I’m avoiding the variance issues with generics for this post. Let’s not make things more complicated than they need to be.) Just to be absolutely clear about this, it’s valid and legal because there’s nothing you can do with factory now which will break normal type safety. You can call factory() and be absolutely sure that it will return an object of some kind (or null) so it’s as safe as anything else.

Now, what about a delegate type which has a void return type:

delegate void Action();

You can’t use CreateStringBuilder as the target of an instance of Action – C# 2 and even C# 3 completely disallow it. My guess is that the CLR disallows it internally. Why is this? Again, any use of an Action delegate can’t possibly care about what the target returns, because it’s not declared to return anything.

I strongly suspect that the answer lies in the implementation of the CLR rather than in any deep semantic reason – the CLR probably needs to know whether or not there’s going to be a return value, in order to do appropriate things with the stack. Even so, it seems a bit of a pity, in terms of elegance. I can’t say I’ve ever felt the need for this in real life, and it would be reasonably easy to fake (for up to four parameters) in .NET 3.5 just by writing a converter from Func<X> to Action<X>, Func<X,Y> to Action<X,Y> etc. It niggles a bit though :)

Refactris

This post is in lieu of writing a proper one, either on the generic maths operators which Marc Gravell has been hard at work on, or on C# 4 which I have a number of opinions about (no surprise there). I will write about both of those topics, but I really ought to do some more work on the manuscript for chapter 2 of the book before I go to bed. Posting a blog entry is a reward for finishing indexing chapters 2 and 13, but both of the serious posts will take longer than I really have time for right now.

So, Refactris. This is a silly idea born a couple of weeks ago, at work. You see, several months ago, I had run out of work on a Friday afternoon at 4pm. My then-team-leader, Rohan, foolishly challenged me to write console-mode Tetris in an hour. I had great fun, and had a demonstrable game of Tetris working precisely one hour later. Now combine that with refactoring, one of the ideas of which is that you can remove lines of code by finding code duplication etc. Put the two together, and you get Refactris.

Letters, digits and symbols would fall from the top, and whenever they landed (in a normal Tetris manner) the game would try to compile the code in the bucket, finding as much to compile as possible. It would try the top line, then the top two lines, then the top three lines, etc, until it reached the bottom – then try the second line, the second and third lines together, etc. Code which compiled would be removed.

Clearly it’s a stupid idea, and I haven’t actually tried to implement it or anything silly like that. Funny enough to share though, and if any of you wish to give it a go, I’d love to see the results.

A cautionary parallel tale: ordering isn’t simple

A little while ago, I wrote about my silly project to test Parallel LINQ – a Mandelbrot generator. In the last week, two things have happened to make this more of a reality. Firstly, the December CTP of Parallel FX has been released. Secondly, my old laptop died, “forcing” me to get a new laptop, which just happens to have a dual core processor.

So, it should just be a case of running it, right? Well, not quite. First let’s have a look at the query expression again, in its serial form:

 

var query = from row in Enumerable.Range(0, ImageHeight)
            from col in Enumerable.Range(0, ImageWidth)                                             
            select ComputeMandelbrotIndex(row, col);

 

And here’s what should be generated.

That’s the result of running without any parallelism, in other words. Now, I realised from the start that we would need ordering, but my first experiment was just to call AsParallel() to see what would happen:

 

var query = from row in Enumerable.Range(0, ImageHeight).AsParallel()
            from col in Enumerable.Range(0, ImageWidth)                                             
            select ComputeMandelbrotIndex(row, col);

 

As expected, that didn’t produce quite the result we wanted:

Well, that’s okay. I wanted to prove that ordering was necessary, and indeed that’s fairly obvious from the result. There are horizontal blocks, returned out of order. Easily fixable, right? After all, I posted what I thought would be the solution with the original post. We just need to give the appropriate option as a method parameter:

 

var query = from row in Enumerable.Range(0, ImageHeight).AsParallel(ParallelQueryOptions.PreserveOrdering)
            from col in Enumerable.Range(0, ImageWidth)                                             
            select ComputeMandelbrotIndex(row, col);

 

Not so fast. It certainly changed things, but not quite as hoped:

I haven’t yet analysed quite why we now have the rows in order but the columns out of order. However, I haven’t quite managed to fix it with the code in its original form. I have managed to fix it by reducing it from two (implicit) loops to one:

 

var query = from row in Enumerable.Range(0, ImageHeight).AsParallel(ParallelQueryOptions.PreserveOrdering)                                             
            select ComputeMandelbrotRow(row);

byte[] data = new byte[ImageHeight * ImageWidth];

int rowStart = 0;
foreach (byte[] row in query)
{
    Array.Copy(row, 0, data, rowStart, ImageWidth);
    rowStart += ImageWidth;
}

 

Instead of getting all the results in one go (with a call to ToArray()) we now have to reassemble all the data into a block. Still, it achieves the desired result. I should point out that PFX has better ways of doing this, Parallel.For being an obvious starting point from the little I know of it. At some point I’ll get round to reading about them, but at the moment life’s too short. I should also say that I don’t expect that any of the pictures indicate a bug in Parallel LINQ, merely my understanding of it.

Update: Explanation and a workaround

Having asked about this on the Connect forum for PFX, Igor Ostrovsky has explained that this is by design – currently only outer ordering is preserved when requested. The issue is still open, however – it’s possible that it will change before release.

In the meantime, Nicholas Palladinos has come up with an alternative solution which I rather like. I’ve refactored it a bit, but the basic idea is to turn the two sequences into one before parallelisation:

 

var points = from row in Enumerable.Range(0, ImageHeight)
             from col in Enumerable.Range(0, ImageWidth)
             select new { row, col };

var query = from point in points.AsParallel(ParallelQueryOptions.PreserveOrdering)                                              
            select ComputeMandelbrotIndex(point.row, point.col);

 

That works really well – in fact, more than twice as fast as the serial version, on my 2-core box!