All posts by jonskeet

Mad props to @arcaderage for the "Princess Rescue" image - see https://toggl.com/programming-princess for the full original

Eduasync part 4: Filling in AsyncTaskMethodBuilder

In part 2, I introduced AsyncVoidMethodBuilder, AsyncTaskMethodBuilder and AsyncTaskMethodBuilder<T>. I showed all the signatures, but the implementations were basically trivial and non-functional. Today, I’m going to change all of that… at least a bit.

In particular, by the end of this post we’ll be able to make the following code actually print out 10 as we’d expect:

private static void Main(string[] args)
{
    Task<int> task = Return10Async();
    Console.WriteLine(task.Result);
}

private static async Task<int> Return10Async()
{
    return 10;
}

Note that there are no await statements – remember that AsyncTaskMethodBuilder<T> is about the boundary between the caller and the async method. We won’t be actually doing anything genuinely asynchronous today – just filling in a piece of the infrastructure.

I’m not going to implement AsyncVoidMethodBuilder or AsyncTaskMethodBuilder, because once you understand how AsyncTaskMethodBuilder<T> works, the others are basically trivial to implement. (Certainly the non-generic AsyncTaskMethodBuilder is too similar to warrant going into; I may potentially revisit AsyncVoidMethodBuilder to investigate its behaviour on exceptions.)

Just as a reminder, this is what we need our AsyncTaskMethodBuilder<T> struct to look like in terms of its interface:

public struct AsyncTaskMethodBuilder<T>
{
    public static AsyncTaskMethodBuilder<T> Create();
    public void SetException(Exception e);
    public void SetResult(T result);
    public Task<T> Task { get; }
}

Frankly, this would be a bit of a pain to implement ourselves… especially if we wanted to do it in a neat way which didn’t introduce a new thread unnecessarily. Fortunately, Parallel Extensions makes it pretty easy.

TaskCompletionSource to the rescue!

The TaskCompletionSource<TResult> type in the framework makes it almost trivial to implement AsyncTaskMethodBuilder<T>. It provides us everything we need – a task, and the ability to specify how it completes, either with an exception or a value. Our initial implementation is really just going to wrap TaskCompletionSource. I suspect the real CTP does slightly more work, but you can get a remarkably long way with simple wrapping:

public struct AsyncTaskMethodBuilder<T>
{
    private readonly TaskCompletionSource<T> source;

    private AsyncTaskMethodBuilder(TaskCompletionSource<T> source)
    {
        this.source = source;
    }

    public static AsyncTaskMethodBuilder<T> Create()
    {
        return new AsyncTaskMethodBuilder<T>(new TaskCompletionSource<T>());
    }

    public void SetException(Exception e)
    {
        source.SetException(e);
    }

    public void SetResult(T result)
    {
        source.SetResult(result);
    }

    public Task<T> Task { get { return source.Task; } }
}

We’ll see how far that takes us – we may need to tweak it a bit, probably around the exception handling. For the moment though, it certainly does everything we need it to. The program runs, and prints out 10 as we’d expect.

This part of the code is now ready for asynchrony – even though nothing we’ve done so far actually creates any different threads. If we call SetResult from one thread while another thread is waiting on the task’s result, the waiting thread will be unblocked as we’d expect.

Conclusion

In some ways I hope you’re disappointed by this, in the same way that looking at the LINQ to Objects implementations can be slightly disappointing. It feels like so much is being given to us for free – the framework already had all this power, so what’s C# 5 really adding to the mix. Well, we’re nearly at the point where I can show you the details of what the compiler’s doing under the hood. That’s where the real power lies, and is why I’m so excited by the feature.

First though, we need to implement the awaiter pattern for Task<T>. That’s the task of our next post, and then we can complete the picture by decompiling the generated code for some programs using genuine asynchrony.

Eduasync part 3: the shape of the async method / awaitable boundary

Last time we looked into the boundary between the caller of an async method and the method itself.

This time I’m going to show the same sort of "skeleton API" but for awaitable types. This is at the heart of C# 5’s async feature: within an async method, you can include "await" expressions. You have to await a value – which could be the return value from a method call, or a property, or the value of a variable. The important thing is that the compile-time type of that expression has the right shape.

Definition time

Consider this code snippet from an async method (separated into two statements for clarity):

Task<string> t = new WebClient().DownloadStringTaskAsync(url);
string text = await t;

I wish to define the following terms:

  • The task of the await expression is the expression which comes after "await"; in this case it’s just "t".
  • The awaitable type is the compile-time type of the task. In this case it’s Task<string>.
  • The awaiter type is the compile-time type of the result of calling GetAwaiter() on the task.
  • The result type is the compile-time return type of the awaiter type’s GetResult method. Here it would be string, when using the CTP or any sensible implementation.

(Of these, "task" and "awaiter type" are part of the draft specification. "Awaitable type" and "result type" aren’t.)

Now, the first two are fairly clear from the code above, but the awaiter and result types aren’t. The compiler validates that calling (foo).GetAwaiter() for a task expression "foo" is valid. It doesn’t have to be an instance method – it can be an extension method, and indeed that’s how it’s implemented in the CTP. Unlike the weird things you can do with LINQ query expressions, GetAwaiter() really does have to be a method call, and the expression can’t just be a type name. Unless you’re trying to do weird stuff, this is very unlikely to be an issue for you. It’s only oddballs like me who try to push the envelope.

With the current CTP the awaitable type can’t be "dynamic", but the aim is for C# 5 to support awaiting dynamic values. I imagine this won’t affect the execution flow significantly though.

The awaitable type only has to support the GetAwaiter() method, as we’ve said. The awaiter type has to have the following accessible members (which can’t be extension methods):

  • bool IsCompleted { get; }
  • void OnCompleted(Action continuation)
  • X GetResult() where X defines the result type for the awaiter, and may be void

The overall type of the await expression is the result type – and it has to be appropriate for any surrounding context. For example, the assignment to the "text" variable will only work in our sample if the result type is "string" or a type with an implicit conversion to string.

A simple (non-functional) awaiter

The sample code for this post comes from project 5 in the source solution (SimpleInt32Awaitable); project 4 is similar but with a void result type. Neither project makes any attempt to do any real awaiting:

// Within some other class
static async void SimpleWaitAsync()
{
    SimpleInt32Awaitable awaitable = new SimpleInt32Awaitable();
    int result = await awaitable;
}
 

public struct SimpleInt32Awaitable
{
    public SimpleInt32Awaiter GetAwaiter()
    {
        return new SimpleInt32Awaiter();
    }
}

public struct SimpleInt32Awaiter
{
    public bool IsCompleted { get { return true; } }

    public void OnCompleted(Action continuation)
    {
    }

    public int GetResult()
    {
        return 5;
    }
}

Note that here both the awaitable type and the awaiter type are structs rather than classes; this is not a requirement by any means.

What does it all mean?

I don’t want to leave this post without any clue of what all of this is used for.

Fairly obviously, GetAwaiter() is called to get an awaiter (clever name, huh?) on which some members are called:

  • IsCompleted is called to determine whether the task has already completed. If it has, we can skip over the next step.
  • If the task hasn’t already completed, we want to return very soon rather than waiting – so OnCompleted is calling with a delegate representing the continuation of the method. The awaiter promises to ensure that the continuation is called when (and only when) the task has completed, possibly with an exception.
  • When we know the task has completed (either synchronously or via the continuation), GetResult is called to retrieve the result. Note that even if the method has a void return type, it still has to be called, as the "result" may be that an exception is thrown.

Conclusion

Awaiters are pretty flexible – which means they can be abused in evil, horrible ways, as we’ll see later on.

So far we’ve seen the basic shape of both boundaries, and now we can start actually implementing them with real, useful code.

Eduasync part 2: the shape of the caller / async method boundary

For the first few parts of this blog series, we’re not going to write code which actually does anything useful. We’re looking at the API more than the implementation – the bare necessities to get the code to compile. Remember that we’re working without AsyncCtpLibrary.dll – which of course supplies all the necessary types normally.

In this part, we’ll look at the boundary between the caller and the async method itself. Our test code will have a method which does nothing but return – but which is marked with the “async” modifier. The fact that we don’t have an “await” expression in the method causes a compiler warning – which is entirely reasonable, but irrelevant to our particular situation.

Three return types, one basic idea

Async methods are limited to three different return types:

  • void
  • Task (non-generic)
  • Task<T> (generic for any type T)

Each of these return types needs a very slightly different form of compiler support… but the general principle is the same. I’ll show each of the three different types, but the test code itself remains exactly the same except for the return type of the method and the return statement. (If you’re going to return Task<int>, you need to return an int, etc.)

namespace Eduasync
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            DoNothingAsync();
        }

// Warning CS1998 is about a method with no awaits in… exactly what we’re trying to
// achieve!
#pragma warning disable 1998 
        // Return type of void, Task or Task<int>
        private static async void DoNothingAsync()
        {
            // For Task<int> insert return 0; here
        }
#pragma warning restore 1998
    }
}

If you try to compile this code without anything else, the compiler gives an error like this:

Test.cs(14,30): error CS0656: Missing compiler required member ‘System.Runtime.CompilerServices.AsyncVoidMethodBuilder.Create’
Test.cs(14,30): error CS1993: Cannot find Task-related types. Are you missing a reference to ‘AsyncCtpLibrary.dll’ ?

While you’d get a similar error for the async method / awaitable boundary, this one is more demanding: you have to have exactly the right types available. It isn’t just a matter of any library providing support for your current situation (in the same way that Edulinq can live in its own namespace and be an alternative for LINQ to Objects without the compiler caring). These are exact types that the compiler relies on. They’re an implementation detail: they can vary between compilers (so Mono could have a different set of types for example, although I doubt that it will, for the sake of binary compatibility) and they’re not part of the language specification… at least at the moment.

There are three types required: AsyncVoidMethodBuilder, AsyncTaskMethodBuilder and AsyncTaskMethodBuilder<T>. They correspond (fairly obviously) to the return types of void, Task, and Task<T> respectively.

They all look largely the same, but here are my initial "non-implementations":

using System.Threading.Tasks;

namespace System.Runtime.CompilerServices
{
    public struct AsyncVoidMethodBuilder
    {
        public static AsyncVoidMethodBuilder Create()
        {
            return new AsyncVoidMethodBuilder();
        }

        public void SetException(Exception e) {}
        public void SetResult() {}
    }

    public struct AsyncTaskMethodBuilder
    {
        public static AsyncTaskMethodBuilder Create()
        {
            return new AsyncTaskMethodBuilder();
        }

        public void SetException(Exception e) {}
        public void SetResult() {}
        public Task Task { get { return null; } }
    }

    public struct AsyncTaskMethodBuilder<T>
    {
        public static AsyncTaskMethodBuilder<T> Create()
        {
            return new AsyncTaskMethodBuilder<T>();
        }

        public void SetException(Exception e) {}
        public void SetResult(T result) {}
        public Task<T> Task { get { return null; } }
    }
}

(In the Eduasync project these are in separate files and have diagnostic statements in the SetException/SetResult methods. I didn’t want to take up too much space here.)

Note that these must be structs. If they’re not, the compiler will complain. All the methods have to have the exact signatures specified, as far as I can tell – except that everything can be internal if you want it to be (so long as your async methods are in the same assembly of course). In practice these are going to be public, and they are public everywhere in Eduasync.

With these implementations, you can compile and even run async methods – but you will, of course, end up with a null reference returned from any async method with a return type of Task or Task<T>.

In case you’re wondering, the code below shows a little taste of how these types are used. It’s from the version of DoNothingAsync which returns a non-generic Task. This is what the compiler replaces our code with:

private static Task DoNothingAsync()

    <DoNothingAsync>d__0 d__ = new <DoNothingAsync>d__0(0);
    d__.<>t__MoveNextDelegate = new Action(d__.MoveNext);
    d__.$builder = AsyncTaskMethodBuilder.Create();
    d__.MoveNext(); 
    return d__.$builder.Task; 
}

Obviously <DoNothingAsync>d__0 is a compiler-generated type (hence the unspeakable name). I’m not going into the details just yet – that will the topic of several posts in a little while. The main point of this post was just to show you the shape of what the compiler requires. The Create() method and the Task property are used within the rewritten method; the SetResult() and SetException() methods are used within the generated type (the state machine) to indicate the async method completing.

Before too long we’ll implement these types properly (which is reasonably straightforward, given help from the BCL).

Conclusion

The caller / async method boundary is relatively inflexible. The compiler relies on particular types, and you simply can’t make an async method return a different kind of value, such as your own Future<T> type. The guts of how it works are all hidden from you, unless you try to compile without the right libraries around: while the valid return types are part of the specification, the types used by the compiler aren’t. While this is perhaps a little unfortunate in a purist sense, it’s not really a big deal. The "consuming" part of the async method (the boundary between the async method and whatever it’s awaiting) is much more flexible, and more interesting. That’s what we’re going to look at next.

Eduasync part 1: introduction

I’ve been waiting to start this blog series for a couple of months. It’s nice to finally get cracking.

Hopefully some of you have already read some of my thoughts around C# 5’s async feature, mostly written last year. Since that initial flurry of posts, I’ve been pretty quiet, but I’m still really excited about it. Really, really excited. I’ve given a few talks on it, and I have a few more still to give – and this blog series will partly be complementary to those talks. In particular, there’s a DevExpress webcast which covers most of the same ground, with similar code. (It was before the CTP refresh, and also before my laptop was stolen in a burglary, so the code here is a rewrite.)

Async from a compiler’s point of view

Most of this blog series (at least the bits I anticipate at the moment) will deal with what the compiler does with async methods. (I haven’t used async delegates much at all, but I can’t imagine that the machinery is particularly different.)

As far as I’ve seen, most of the coverage on the web so far has dealt with using async. That’s natural, logical and entirely proper. Oh, and a bit boring after a while. I like knowing how a feature works before I go too far using it. This is a personal idiosyncrasy, and if you’re happy just using async with no “under the hood” details, that’s absolutely fine. It’s probably worth unsubscribing from my blog for a little while, that’s all.

This can all be seen as pretty similar to my Edulinq series of posts, which is why I’ve called it Eduasync this time.

My plan is to walk you through what the C# compiler relies on – the types which are currently part of AsyncCtpLibrary.dll, and how it interacts with Task / Task from .NET 4. We’ll then look at the code generated by the compiler – essentially a state machine – and some of the less obvious aspects of it. I’ll give examples of any bugs I’ve found in the CTP, just for the heck of it – and as a way of checking whether they’re fixed in later versions. (Obviously I’ve let the C#/VB team know about these as I’ve come across them.)

I’ll assume that you know the basics of using async – so if you don’t, now would be a good time to look at the numerous resources on the Visual Studio Async home page. There are loads of videos, specs (including the C# spec changes, most importantly from my point of view)

Get the source now

There’s already quite a bit of source code (everything I’m currently planning on writing about, which is almost inevitably less than I’ll actually end up writing about) on the Google Code Eduasync project. This takes a different approach from Edulinq – instead of just a couple of projects (production and tests, basically) I’ve got a separate project for each topic I want to talk about, with pretty minimal code for that topic. The reason for this is to show the evolution of the code – starting off with almost nothing, and progressing until we’ve got an implementation which achieves at least the bare bones important bits of an async system.

I’ve numbered the projects within the solution, although the assemblies themselves don’t have the same numbers. They all use a default namespace of just Eduasync, and they don’t refer to each other. Each is meant to be self-contained – oh, and there are no references to AsyncCtpLibrary.dll. The whole point is to reimplement that library :) Of course, you’ll still need the CTP installed to get the compiler changes.

The Google Code repository will also contain the blog posts eventually, including any diagrams I need to create (such as the one in a minute).

The three blocks and two boundaries

One of the things I’ve found important to think about in async is the various parts involved. I’ve ended up with a mental model like this:

The bits in blue and red are the ones we’re focusing on here: the contents of the async method, and the boundaries between that and the code that calls it, and the tasks (or other awaitable types) that it awaits.

For most of this series we’re not really going to care much about what the caller does with the result, or how the awaitable object behaves other than in terms of the methods and properties used by the C# 5 compiler. I’ll discuss the flexibility afforded though – and how it doesn’t extend to the “caller/async” boundary, only the “async/awaitable” boundary.

Just to give an explicit example of all of this, here’s a simple little program to asynchronously determine the size of the Stack Overflow home page:

using System;
using System.Net;
using System.Threading.Tasks;

class Program
{
    // Caller (block 1)
    static void Main()
    {
        Task<int> sizeTask = DownloadSizeAsync(http://stackoverflow.com&#8221;);
        Console.WriteLine(“In Main, after async method call…”);        
        Console.WriteLine(“Size: {0}”, sizeTask.Result);
    }
    
    // Async method (block 2)
    static async Task<int> DownloadSizeAsync(string url)
    {
        var client = new WebClient();
        // Awaitable (block 3)
        var awaitable = client.DownloadDataTaskAsync(url);
        
        Console.WriteLine(“Starting await…”);
        byte[] data = await awaitable;
        Console.WriteLine(“Finished awaiting…”);
        
        return data.Length;
    }
}

The comments should make it reasonably clear what the blocks in the diagram mean. It’s not ideal in that the first two blocks are basically methods, whereas the third block is an object – but I’ve found that it still makes sense when we’re thinking about the interactions involved at the boundaries. Notably:

  • How does the async method create an appropriate value to return to the caller?
  • How does the async method interact with the awaitable when it hits an “await” expression?

We can (and we’re going to) look at these boundaries very separately. We’ll start off with the first bullet, in part two, which will hopefully follow in the next few days.

Of memory and strings

This post was provoked by a recent Stack Overflow question which asked whether there was an efficient representation of ASCII strings in .NET.

In particular, the questioner wanted to store hundreds of thousands – possibly millions – of strings in memory, and knowing (or assuming) that they all consisted of ASCII characters, he wanted to avoid the waste of space that comes from storing each character in a .NET string as a UTF-16 code unit.

My answer to the question mostly consisted of saying that I didn’t think it would be worth the effort, and giving some reasons. But the more reasons I gave, the more I thought about the subtleties involved, and that it’s actually quite an interesting case study into memory use in .NET.

How are we going to measure object size?

If we’re going to work out any sort of benefit from a more compact string representation, we’ll need to be able to calculate how much memory our objects are taking to start with. Rather than work this out in a purely theoretical way, I’ve been running tests using code like this:

using System;

class Program
{
    static void Main(string[] args)
    {
        int size = 10000000;
        object[] array = new object[size];

        long before = GC.GetTotalMemory(true);
        for (int i = 0; i < size; i++)
        {
            array[i] = new object();
        }
        long after = GC.GetTotalMemory(true);

        double diff = after – before;

        Console.WriteLine(“Per object: “ + diff / size);

        // Stop the GC from messing up our measurements
        GC.KeepAlive(array);
    }
}

Obviously that doesn’t take into account factors such as memory used by JITting, or anything that could be going on in other threads, but by using a suitably large number of objects, and by performing the division in floating point arithmetic (to avoid a slight variation making an 11.99999 come out as an 11, when it’s really just a “12 with something else going on”, we can work out the size of objects pretty clearly. The sample above measures the size of a vanilla object, but the code can be adapted very easily.

The first important thing to point out is that C# doesn’t guarantee the results of this – it isn’t responsible for determining how all of an object is laid out in memory; that’s the runtime’s job. While there are attributes to affect the layout and padding of the data members of a type in memory, there are other aspects that are out of your control. In this post I won’t use any of the layout attributes – we’ll just use the defaults.

Not all runtimes are created equal, either. On my laptop I’ve got Mono 2.8, .NET 3.5 and .NET 4, with the two versions of .NET each having the 32-bit (x86) and 64-bit (x64) CLRs. For the sake of simplicity, I’m going to stick with .NET 4 for this post, but I’ll give results for both the x64 and x86 CLRs. To test each of them, I’m compiling with “/platform:x64” or “/platform:x86”.

Some simple starter measurements

Before I start creating my own types, let’s try a few built-in types, including strings:

Type x86 size x64 size
object 12 24
object[] 16 + length * 4 32 + length * 8
int[] 12 + length * 4 28 + length * 4
byte[] 12 + length 24 + length
string 14 + length * 2 26 + length * 2

Note that all the x86 sizes are rounded up to the nearest 4 bytes, and all x64 sizes are rounded up to the nearest 8 bytes.

The string numbers are interesting, because strings are the only non-array types in .NET which vary in size. A long string consists of a single large object in memory. Compare this with Java, where a String is a “normal” type in terms of memory consumption, containing an offset and length into a char array – so a long string consists of a small object referring to a large char array. This distinction will be very important when we come to build an AsciiString type. Another point about measuring string sizes is that it’s relatively tricky to measure the size of an empty string – because even if you use the “new string(‘x’, 0)” constructor, the result is still cached – the same reference is returned each time.

You might be forgiven for looking at the numbers above and thinking that the “overhead” of an object is 12 bytes in x86 and 24 in x64… but that’s not quite right. Let’s build our own straightforward classes and measure them…

Custom classes containing primitives

Here are six classes, all of which are measured with the same simple test code:

class Empty {}
class OneInt32 { int x; }
class TwoInt32 { int x, y; }
class ThreeInt32 { int x, y, z; }

class Mixed1
{
    int x;
    byte b1, b2, b3, b4;
    int y, z;
}

class Mixed2
{
    int x;
    byte b1;
    int y, z;
    byte b2, b3, b4;
}

The last case is mostly to check how the CLR handles an “awkward” class declaration, where the int variables won’t naturally be aligned on 4-byte boundaries. The results look odd at first, but we’ll make sense of them in a minute:

Type x86 size x64 size
Empty 12 24
OneInt32 12 24
TwoInt32s 16 24
ThreeInt32s 20 32
Mixed1 24 32
Mixed2 24 32

A few interesting things to note here:

  • There’s a “base” overhead of 8 bytes per object in x86 and 16 per object in x64… given that we can store an Int32 of “real” data in x86 and still have an object size of 12, and likewise we can store two Int32s of real data in x64 and still have an object of x64.
  • There’s a “minimum” size of 12 bytes and 24 bytes respectively. In other words, you can’t have a type which is just the overhead. Note how the “Empty” class takes up the same size as creating instances of Object… there’s effectively some spare room, because the CLR doesn’t like operating on an object with no data. (Note that a struct with no fields takes up space too, even for local variables.)
  • The x86 objects are padded to 4 byte boundaries; on x64 it’s 8 bytes (just as before)
  • By default, the CLR is happy to pack fields pretty densely – Mixed2 only took as much space as ThreeInt32. My guess is that it reorganized the in-memory representation so that the bytes all came after the ints… and that’s what a quick bit of playing around with unsafe pointers suggests too… but I’m not sufficiently comfortable with this sort of thing to say for sure. Frankly, I don’t care… so long as it all works, what we’re interested in is the overall size, not the precise layout.

So what does an ASCII string look like?

In this blog post I’m not actually going to implement an ASCII string at all (well, not much). I’m merely pointing out what the data structures would look like. However, it’s worth working out what desirable qualities it should have. As far as possible, it should feel like System.String. In particular:

  • It should be immutable.
  • It should have fast access to individual characters, and the length.
  • It should mostly “feel” like an immutable reference type, in that passing a value of type AsciiString around should be cheap, like copying a reference.
  • It should use as little memory as possible… less than the equivalent string, or it’s pointless.
    • One caveat to this: in theory that could mean storing 8 characters in every 7 bytes, as ASCII really only uses 7 bits per character. I’m not going to those extremes, but you can think about them if you want.

We’re going to store the characters as a byte array. We have three options as to exactly how we handle that byte array:

  • We could go the Java way, where several strings share references to the same array. Each string then has an offset and a length to say which bit of the array they’re interested in.
    • Pros: Substring becomes really cheap
    • Cons: You can end up having just a tiny substring responsible for keeping a huge character array alive
  • We could go the .NET way, where each string has its own character data, but the buffer may be longer than necessary… so it stores the length too. (A bit like a List<T>.)
    • Pros: Can potentially make building strings cheap, if you just keep whatever potentially oversized buffer you’ve already got.
    • Cons: Wasted space for the unused part of the array, and a field for the length.
  • We could just have a byte array of exactly the right size – and it already knows its size.

I’m going to assume the third option here. So all the data our type needs is a byte array. That’s going to be pretty cheap… we hope. Let’s look at what we can build.

Option 1: A simple class

To give a flavour of the implementation, I’ve decided to implement four members for each option:

  • A way of creating an AsciiString from a regular string
  • The Substring overload with both a start and length
  • The Length property
  • The indexer returning a char

Hopefully that will give enough of an idea of what’s going on to be useful. Note that these aren’t production-quality implementations at all… none of the code has ever been run at all. I have made sure it compiles, so just be grateful for that :)

using System;
using System.Text;

public sealed class AsciiString
{
    private readonly byte[] data;

    public AsciiString(string text)
    {
        // TODO: Rather more data validation etc!
        data = Encoding.ASCII.GetBytes(text);
    }

    private AsciiString(byte[] data)
    {
        // This constructor is private: we can trust that it’s been called
        // by code which isn’t going to modify the contents of the array
        // afterwards.
        this.data = data;
    }

    public AsciiString Substring(int startIndex, int length)
    {
        if (startIndex < 0 || startIndex > data.Length)
        {
            throw new ArgumentOutOfRangeException(“startIndex”);
        }
        if (startIndex + length > data.Length)
        {
            throw new ArgumentOutOfRangeException(“length”);
        }
        byte[] newData = new byte[length];
        Buffer.BlockCopy(data, startIndex, newData, 0, length);
        return new AsciiString(newData);
    }

    public int Length { get { return data.Length; } }

    public char this[int position] { get { return (char) data[position]; } }
    // etc…
}

Hopefully this is pretty straightforward – it’s meant to be the most “obvious” solution. Note that we’ve not got the nice locality of reference which the real String class has – it’s possible that the an AsciiString could end up with its backing array a long way away in memory, so a indexer operation for a single character could end up with three cache misses – one for the AsciiString object, one for part of the data array storing the length (for argument validation) and one for the part of the data array containing the character we’re looking for. That may be unlikely, and it’s not the kind of thing I normally think about – but it’s probably the kind of thing the BCL team pay a lot of attention to.

We get the same “immutable reference type” behaviour present in the normal string type, however – you can have a null AsciiString reference just as normal, any assignments will just be reference assignments, etc.

What about the size though? There are two objects to consider:

  • The array, of size 12 + length or 24 + length (x86 and x64 respectively; rounded up to 4 or 8 bytes as well)
  • The object itself, of size 12 or 24

So we’ve got a total size of 24 + length or 48 + length, depending on architecture. To show how the break-even point works, here’s a little table showing the sizes of string and AsciiString for various sizes on both architectures:

Length string-x86 string-x64 AsciiString-x86 AsciiString-x64
0 16 32 24 48
1 16 32 28 56
2 20 32 28 56
3 20 32 28 56
4 24 40 28 56
5 24 40 32 56
6 28 40 32 56
7 28 40 32 56
8 32 48 32 56
9 32 48 36 64
10 36 48 36 64
..
16 48 64 40 64
24 64 80 48 72
32 80 96 56 80

As you can see, the break-even point in x86 is at length 10; in x64 it’s at length 16. After that, we start winning – as we’d expect. The penalty for very small strings is quite hefty though – you’d really better hope you didn’t have lots of single-character strings, taking 56 bytes each in x64.

Let’s see if we can do better…

Option 2: A simple struct

A lot of the overhead here has come from the fact that we’ve got an object which only has a single field. The field is all we’re interested in… why are we bothering with all the overhead of the object? Let’s make it a struct instead, effectively inlining that field wherever we use the type. Assignment, passing arguments to methods etc will still only be copying a reference – it’s just the reference will be the byte array rather than a wrapper object.

It all sounds good, but there are two snags:

  • The value can never be null; that at least diverges from the familiar string behaviour
  • We won’t be able to prevent code from creating an instance of our struct with new AsciiString() – and that won’t be good.

We can actually pit these two downsides against each other by making the “default” value a pseudo-null value… we can even throw NullReferenceException just as if it were a reference type. We don’t even need to do any work in order to get that NullReferenceException – every member is going to use the data array anyway, and dereferencing that will automatically throw an exception. We might want to change things around a bit to make that the very first thing that can throw an exception, but that’s all.

It’s nasty, but it appeals very slightly. In an evil kind of way. It makes things slightly more familiar, but at the cost of being generally weird in other ways.

We still need to be able to check whether an AsciiString value is the logical null value. I’ll add an IsNull property for that purpose. (An alternative would be HasValue, but that would be confusing with Nullable<T>.)

Most of the code remains untouched – it looks like this:

public struct AsciiString
{
    private readonly byte[] data;
    public bool IsNull { get { return data == null; } }

    // Remainder of code as before
}

Now let’s look at the sizes, which should be a lot more favourable than before. Note that I had to change the size-checking code to create an array of type AsciiStruct[] instead of object[] to avoid boxing. Should we take the size of the array itself into consideration when computing the size of the AsciiString? We haven’t when working out the size of string… in each case the size of any individual element will be the size of a reference. For the table below, I haven’t included it… but bear in mind that this form of measurement would count the size of most value types (int etc) as 0. It just goes to show that when you talk about the size of a data type, you really need to be very precise in what you mean.

Length string-x86 string-x64 AsciiString-x86 AsciiString-x64
0 16 32 12 24
1 16 32 16 32
2 20 32 16 32
3 20 32 16 32
4 24 40 16 32
5 24 40 20 32
6 28 40 20 32
7 28 40 20 32
8 32 48 20 32
9 32 48 24 40
10 36 48 24 40
..
16 48 64 28 40
24 64 80 36 48
32 80 96 44 56

This time, unsurprisingly, AsciiString is always more space-efficient than the normal string. It just takes a certain amount of holding our noses. Speaking of which…

Option 3: Extension methods on byte[]

Suppose we really, really want to have “proper” null references. We don’t really need the struct. We could treat any byte array as an array of ASCII characters, with extension methods like this:

public static class ByteArrayExtensions
{
    public static byte[] Substring(this byte[] data, int startIndex, int length)
    {
        if (startIndex < 0 || startIndex > data.Length)
        {
            throw new ArgumentOutOfRangeException(“startIndex”);
        }
        if (startIndex + length > data.Length)
        {
            throw new ArgumentOutOfRangeException(“length”);
        }
        byte[] newData = new byte[length];
        Buffer.BlockCopy(data, startIndex, newData, 0, length);
        return newData;
    }
}

The size is the same as with option 2 – in both cases there’s just the byte array, basically. This option is truly horrible in many ways though:

  • You can no longer tell in your code (just through typing) what’s meant to be an AsciiString and what isn’t
  • Kiss immutability goodbye
  • We can’t guarantee that all the characters will be valid ASCII any more
  • We can’t add extension properties or extension indexers
  • We can’t make it implement the interfaces we want it to

Obviously, we’d never take this route. I just thought I’d include it for a regular dose of evil-ness.

Conclusion

Implementing an ASCII-only string representation sounds like it should be an obvious win in terms of memory, at the cost of doing a lot of work that’s already been done for us in String. However, the most obvious implementation takes a while to break even in memory usage, compared with the normal string type, due to the “special” nature of string within the CLR. We can’t mimic the “stretchiness” of string ourselves. The BCL/CLR teams could, of course, if they really wanted to. I’m not holding my breath though.

If we’re dead keen on saving space at the cost of some design wonkiness, we can use a value type instead of a reference type. Other than nullity, it works pretty well… but you have all the disadvantages which go with value types, such as the unavoidable parameterless constructor and the need to watch out for boxing.

Aside from anything else, I hope this was useful as a delve into how much space objects actually take up in .NET – and as a way of highlighting the extra memory used when running in the x64 CLR.

Edulinq – the e-book

I’m pleased to announce that I’ve made a first pass at converting the blog posts in the Edulinq series into e-books.

I’m using Calibre to convert to PDF and e-book format. I still have a way to go, but they’re at least readable. The Kindle version (MOBI format) is working somewhat better than the PDF version at the moment, which surprises me. In particular, although hyperlinks are displaying in the PDF, they don’t seem to be working – whereas at least the internal links in the Kindle format are working.

I’ll no doubt try to improve things over time, but I’ve put these early attempts up on the root page of the Edulinq project site. I’ve also put all the blog posts up as HTML in the site’s source control; that means you can browse the latest version directly. It also means if you sync the source control, download Calibre yourself and add “index.html” as a new e-book in the Calibre library, you can play with the conversion yourself and help me improve things. Feedback about problems is welcome; feedback including the fix is even better :)

Reimplementing LINQ to Objects: Part 45 – Conclusion and List of Posts

Table of Contents

You may consider it a little odd to have a list of posts as the final part in the series, but it makes sense when you consider that visiting the Edulinq tag page shows results in reverse chronological order. At that point, a newcomer will hopefully hit this post first, and then find it easier to navigate to the first post. Anyway…

  1. Introduction
  2. Where
  3. Select
  4. Range
  5. Empty
  6. Repeat
  7. Count and LongCount
  8. Concat
  9. SelectMany
  10. Any and All
  11. First, Single, Last and the …OrDefault versions
  12. DefaultIfEmpty
  13. Aggregate
  14. Distinct
  15. Union
  16. Intersect
  17. Except
  18. ToLookup
  19. Join
  20. ToList
  21. GroupBy
  22. GroupJoin
  23. Take, Skip, TakeWhile, SkipWhile
  24. ToArray
  25. ToDictionary
  26. Ordering:
  27. Reverse
  28. Sum
  29. Min and Max
  30. Average
  31. ElementAt and ElementAtOrDefault
  32. Contains
  33. Cast and OfType
  34. SequenceEqual
  35. Zip
  36. AsEnumerable
  37. Guiding principles
  38. What’s missing?
  39. Comparing implementations
  40. Optimization
  41. How query expressions work
  42. More optimization
  43. Out-of-process queries with IQueryable
  44. Aspects of design
  45. Conclusion and List of Posts

Thank you, and good night

When I started this series, I hadn’t realised quite how much there would be to write about. The main thrust was going to be that the implementation of LINQ is simple, and it’s the design that’s clever. As it happened, pretty much every operator ended up raising some interesting issue or other. However, hopefully the series has still "immersed" you in LINQ to Objects to some extent, and clarified how it all hangs together. It would be gratifying to think that the description at the start of each post may end up being used as a sort of "unofficial alternative documentation" with some more details than MSDN provides, but we’ll see whether than happens over time.

A number of people have asked me whether there’ll be an ebook version of this series, and the answer is currently "I don’t know." I have a few plans afoot, but I can’t tell where they’ll lead yet. Suffice to say I like the idea, and I’m looking at some options.

Anyway, thank you for reading as much of the series as you have, and I hope you’ve enjoyed it as much as I have.

Just in case you’re wondering, I’ll probably go back to posting about C# 5’s async support pretty soon…

Reimplementing LINQ to Objects: Part 44 – Aspects of Design

I promised a post on some questions of design that are raised by LINQ to Objects. I suspect that most of these have already been covered in other posts, but it may well be helpful to talk about them here too. This time I’ve thought about it particularly from the point of view of how other APIs can be built on some of the same design principles, and the awkward choices that LINQ has thrown up.

The power of composability and immutability

Perhaps the most important aspect of LINQ which I’d love other API designers to take on board is that of how complicated queries are constructed from lots of little building blocks. What makes it particularly elegant is that the result of applying each building block is unchanged by anything else you do to it afterwards.

LINQ doesn’t enforce immutability of course – you can start off with a mutable list and change its content at any time, for example, or change the properties of one of the objects referenced within it, or pass in a delegate with side-effects – but LINQ itself won’t introduce side-effects.

The Task-based Asynchronous Pattern takes a similar approach, allowing composable building blocks of tasks. I’ve seen this pattern in various guises over the years – if you find yourself thinking in terms of a pipeline of some kind, it may well be appropriate, especially if each state in the pipeline emits the same type as it consumes.

General immutability is a somewhat different design trait of course, but one which can make such a difference. The java.util.{Date,Calendar} classes are horrible, not least because they’re mutable – you can never stash a value away without being concerned that it may get changed by something else. Joda Time has some mutable implementations, but typically the immutable classes are used in a fluent way. Of course, .NET uses value types for various core types to start with, but also makes TimeZoneInfo immutable. For genuine "values" I would highly encourage API designers to at least strongly consider immutable types. They’re not always appropriate by any means, but they can be hugely useful where they fit nicely.

Extension methods on interfaces

It’s no surprise that extension methods are heavily used in LINQ, given that they were effectively introduced into the language in order to enable LINQ in the first place. However, they do work particularly well with interfaces as a way of adding common behaviour.

It also plays very nicely with the pipeline pattern above for creating pipelines in a fluent manner. Even if you just create extension methods which call a constructor to wrap/compose the previous stage in the pipeline, you can still end up with more readable code.

One problem with this is that you can’t "override" behaviour in particular implementations or interfaces which extend the original one – which is why Enumerable.ElementAt() has to detect that a sequence is actually a list, for example. If interfaces allowed method implementations, this wouldn’t be as much of a problem in the situation where you’re in control of the interface – I wouldn’t be at all surprised to find that as a feature of C#’s successor.

The lack of extension properties is also a bit of a handicap in some places, although not as many as one might expect at first glance. For example, even if we could have made Enumerable.Count() a property, would it have been a good idea to do so? Properties give a natural expectation of speed, and Count() is usually an O(n) operation.

Delegates for custom behaviour

In .NET 1.0 and 1.1, most developers used delegates for two purposes:

  • Handling events in UIs
  • Passing around behaviour to be executed in a different thread (either via Control.Invoke, or new Thread(ThreadStart), or ThreadPool.QueueUserWorkItem).

.NET 2.0 increased the range of uses of delegates somewhat, particularly with List.ConvertAll and the ability to create delegates relatively easily using anonymous methods.

However, LINQ really brought them into the mainstream. If you’re building an API which benefits from small pieces of custom behaviour, delegates can be a real boon. More complicated behaviour is still often best represented via an interface, and sometimes it’s worth having both interface and delegate representations, like Comparison<T> and IComparable<T>. It’s generally easy to convert between the two – especially if you use a method group conversion from an interface implementation’s method to the delegate type.

Laziness

One aspect of LINQ which is both a blessing and a curse is its laziness, both in terms of deferred execution (not reading from the input sequence at all until the result sequence is read) and in terms of streaming the data (only reading as much information from the input sequence as is required to answer the immediate needs of the caller).

This is great in various ways, particularly as it means you can build a complex query and use it multiple times, sometimes as a basis for other queries, knowing that it won’t actually do anything until you ask for real results. It also means that you can iterate over huge data sets, so long as you’re careful.

On the other hand, it leads to subtle issues over when code is actually executing, makes debugging harder to understand, makes it easier to accidentally change the values of captured variables between the point at which you create the query and the point at which you execute it, and basically messes with your head. This is probably the aspect of LINQ which confuses newbies more than any other.

I’m not saying it was the wrong decision for LINQ – but I would caution API designers to think carefully before introducing laziness, and to document it really thoroughly. Likewise if your API might end up returning a result which is "gradually evaluated" (streaming data etc), this should be made clear.

When names collide: options for consistency

Just in case you’ve forgotten, this is irritation with the meaning of source.Contains(element). In order to check whether a sequence contains an element or not, there has to be some idea of equality – for example, if you’re trying to find one string in a sequence of strings, are you trying to match in a case sensitive manner or not?

There’s an overload for Enumerable.Contains which allows you to specify the equality comparer to use, but the question is what should happen when you let the implementation pick the comparer.

For every other method in Enumerable, the default equality comparer for the sequence type – i.e. EqualityComparer<TSource>.Default – is picked. That sounds like source.Contains(element) should use the element type’s default comparer too, right? Well, in some cases that’s what will happen… but not if the source implements ICollection<T>, which has its own Contains method which doesn’t take an equality comparer. If that’s the case, LINQ to Objects delegates to the collection’s Contains method.

So, we have three kinds of consistency here:

  • Consistency of compile-time type: it would be nice if the behaviour of source.Contains(element) was the same whether "source" is of type IEnumerable<T> and ICollection<T>
  • Consistency of API: it would be nice if Contains behaved the same way as other methods which have overloads with and without equality comparers
  • Consistency of model: if you consider "source" to be just a sequence of elements, it shouldn’t affect the result (not just the speed) if the object actually implements ICollection<T>

I should point out that this will only be a problem if the collection uses a different notion of equality to the default equality comparer for the type. The most obvious example of this is if you have a HashSet<string> which uses a case-insensitive equality comparer. But it’s still a valid concern.

So what should the API designer do in this kind of case? Admittedly LINQ to Objects is already in a slightly unusual position as it’s based on an existing interface with known and very common interfaces extending that core one… it’s less likely to come up with other APIs. However, I think it might be enough of a smell to suggest that changing the name of the method to "ContainsElement" or something similar would be worthwhile. It’s unfortunate that "Contains" really is the obvious choice…

This issue raises another aspect of API decision I’ve considered in the past… if there’s a common way of doing something in the framework you’re building on top of, but you consider it to be broken, should you abide by that breakage for the sake of familiarity and consistency, or should you strive to be as "clean" as possible? I think it needs to be considered on a case-by-case basis, but I suspect I would usually come down on the side of cleanliness.

Documentation details

Almost all APIs are badly documented – it’s a fact of life, even with some of the best APIs I’ve worked with. I doubt that Noda Time will be a shining example either. However, at the risk of being hypocritical I’ll say that documentation is worth spending significant thought on. Not just the time taken to document your code – but the time taken to consider what you want to guarantee, what should be left unstated, and what should be explicitly left open.

For example, there’s no indication in the documentation of Cast that it will sometimes return the original source value, nor in its companion OfType method that that will never return the original source reference. This might be important to someone – why not state it? It’s possible to state the possibility without saying what cases it applies to of course, leaving some wiggle room in the future. You might consider some of the optimizations in the same way – when should an optimization be documented and when should it be implicit? Sometimes it can make a difference beyond just performance, even if only in "odd" situations (such as a predicate throwing an exception).

If you’re used to defensive coding with Code Contracts, it’s much the same type of decision – and again, it’s similar to deciding whether a method should return IEnumerable<T>, IList<T> or List<T>. There’s a balance between caller convenience, design cleanliness (where you only want to emphasize one interface aspect, even if it also happens to always return a particular type), and room for the implementation to change in the future.

Another example of considering the level of detail to document is when it comes to how input sequences are used in LINQ to Objects. What does it mean to say "this method uses deferred execution" exactly? If I call GetEnumerator() eagerly but defer the call to MoveNext(), is that still "deferred execution"? Should the documentation state when a sequence is buffered and when it’s streamed? Should it guarantee the order of the result sequence when the natural implementation makes that order easy to describe (e.g. for Distinct)? In this series I’ve tried to be as clear as possible about what actually happens – but that’s not to say that in some cases, the documentation wasn’t left deliberately ambiguous.

Conclusion

There are many other design considerations that I haven’t gone into here – particularly optimization, which I’ve already covered twice, probably saying everything I wanted to say here anyway.

I may add a few more bits to this post over time… but aside from that, I think I’m fundamentally done. I’ll write one more conclusion post, then declare Edulinq closed…

Reimplementing LINQ to Objects: Part 43 – Out-of-process queries with IQueryable

I’ve been putting off writing about this for a while now, mostly because it’s such a huge topic. I’m not going to try to give more than a brief introduction to it here – don’t expect to be able to whip up your own LINQ to SQL implementation afterwards – but it’s worth at least having an idea of what happens when you use something like LINQ to SQL, NHibernate or the Entity Framework.

Just as LINQ to Objects is primarily interested in IEnumerable<T> and the static Enumerable class, so out-of-process LINQ is primarily interested in IQueryable<T> and the static Queryable class… but before we get to them, we need to talk about expression trees.

Expression Trees

To put it in a nutshell, expression trees encapsulate logic in data instead of code. While you can introspect .NET code via MethodBase.GetMethodBody and then MethodBody.GetILAsByteArray, that’s not really a practical approach. The types in the System.Linq.Expressions define expressions in an easier-to-process manner. When expression trees were introduced in .NET 3.5, they were strictly for expressions, but the Dynamic Language Runtime uses expression trees to represent operations, and the range of logic represented had to expand accordingly, to include things like blocks.

While you certainly can build expression trees yourself (usually via the factory methods on the nongeneric Expression class), and it’s fun to do so at times, the most common way of creating them is to use the C# compiler’s support for them via lambda expressions. So far we’ve always seen a lambda expression being converted to a delegate, but it can also convert lambdas to instances of Expression<TDelegate>, where TDelegate is a delegate type which is compatible with the lambda expression. A concrete example will help here. The statement:

Expression<Func<int, int>> addOne = x => x + 1;

will be compiled into code which is effectively something like this:

var parameter = Expression.Parameter(typeof(int), "x");
var one = Expression.Constant(1, typeof(int));
var addition = Expression.Add(parameter, one);
var addOne = Expression.Lambda<Func<int, int>>(addition,  new ParameterExpression[] { parameter });

The compiler has some tricks up its sleeves which allow it to refer to methods, events and the like in a simpler way than we can from code, but largely you can regard the transformation as just a way of making life a lot simpler than if you had to build the expression trees yourself every time.

IQueryable, IQueryable<T> and IQueryProvider

Now that we’ve got the idea of being able to inspect logic relatively easily at execution time, let’s see how it applies to LINQ.

There are three interfaces to introduce, and it’s probably easiest to start with how they appear in a class diagram:

Most of the time, queries are represented using the generic IQueryable<T> interface, but this doesn’t actually add much over the nongeneric IQueryable interface it extends, other than also extending IEnumerable<T> – so you can iterate over the contents of an IQueryable<T> just as with any other sequence.

IQueryable contains the interesting bits, in the form of three properties: ElementType which indicates the type of the elements within the query (in other words, a dynamic form of the T from IQueryable<T>), Expression returns the expression tree for the query so far, and Provider returns the query provider which is responsible for creating new queries and executing the existing one. We won’t need to use the ElementType property ourselves, but we’ll need both the Provider and Expression properties.

The static Queryable class

We’re not going to implement any of the interfaces ourselves, but I’ve got a small sample program to demonstrate how they all work, imagining we were implementing most of Queryable ourselves. This static class contains extension methods for IQueryable<T> just as Enumerable does for IEnumerable<T>. Most of the query operators from LINQ to Objects appear in Queryable as well, but there are a few notable omissions, such as the To{Lookup, Array, List, Dictionary} methods. If you call one of those on an IQueryable<T>, the Enumerable implementations will be used instead. (IQueryable<T> extends IEnumerable<T>, so the extension methods in Enumerable are applicable to IQueryable<T> sequences as well.)

The big difference between the Queryable and Enumerable methods in terms of their declarations is in the parameters:

  • The "source" parameter in Queryable is always of type IQueryable<TSource> instead of IEnumerable<TSource>. (Other sequence parameters such as the sequence to concatenate for Queryable.Concat are expressed as IEnumerable<T>, interestingly enough. This allows you to express a SQL query using "local" data as well; the query methods work out whether the sequence is actually an IQueryable<T> and act accordingly.)
  • Any parameters which were delegates in Enumerable are expression trees in Queryable; so while the selector parameter in Enumerable.Select is of type Func<TSource, TResult>, the equivalent in Queryable.Select is of type Expression<Func<TSource, TResult>>

The big difference between the methods in terms of what they do is that whereas the Enumerable methods actually do the work (eventually – possibly after deferred execution of course), the Queryable methods themselves really don’t do any work: they just ask the query provider to build up a query indicating that they’ve been called.

Let’s have a look at Where for example. If we wanted to implement Queryable.Where, we would have to:

  • Perform argument checking
  • Get the "current" query’s Expression
  • Build a new expression representing a call to Queryable.Where using the current expression as the source, and the predicate expression as the predicate
  • Ask the current query’s provider to build a new IQueryable<T> based on that call expression, and return it.

It all sounds a bit recursive, I realize – the Where call needs to record that a Where call has happened… but that’s all. You may very well wonder where all the work is happening. We’ll come to that.

Now building a call expression is slightly tedious because you need to have the right MethodInfo – and as Where is overloaded, that means distinguishing between the two Where methods, which is easier said than done. I’ve actually used a LINQ query to find the right overload – the one where the predicate parameter Expression<Func<T, bool>> rather than Expression<Func<T, int, bool>>. In the .NET implementation, methods can use MethodBase.GetCurrentMethod() instead… although equally they could have created a bunch of static variables computed at class initialization time. We can’t use GetCurrentMethod() for experimentation purposes, because the query provider is likely to expect the exact correct method from System.Linq.Queryable in the System.Core assembly.

Here’s our sample implementation, broken up quite a lot to make it easier to understand:

public static IQueryable<TSource> Where<TSource>(
    this IQueryable<TSource> source,
    Expression<Func<TSource, bool>> predicate)
{
    if (source == null)
    {
        throw new ArgumentNullException("source");
    }
    if (predicate == null)
    {
        throw new ArgumentNullException("predicate");
    }
        
    Expression sourceExpression = source.Expression;
    Expression quotedPredicate = Expression.Quote(predicate);
        
    // This gets the "open" method, without specific type arguments. The second parameter
    // of the method we want is of type Expression<Func<TSource, bool>>, so the sole generic
    // type argument to Expression<T> itself has two generic type arguments.
    // Let’s face it, reflection on generic methods is a mess.
    MethodInfo method = typeof(Queryable).GetMethods()
                                         .Where(m => m.Name == "Where")
                                         .Where(m => m.GetParameters()[1]
                                                      .ParameterType
                                                      .GetGenericArguments()[0]
                                                      .GetGenericArguments().Length == 2)
                                         .First();
        
    // This gets the method with the same type arguments as ours
    MethodInfo closedMethod = method.MakeGenericMethod(new Type[] { typeof(TSource) });
        
    // Now we can create a *representation* of this exact method call
    Expression methodCall = Expression.Call(closedMethod, sourceExpression, quotedPredicate);
        
    // … and ask our query provider to create a query for it
    return source.Provider.CreateQuery<TSource>(methodCall);
}

There’s only one part of this code that I don’t really understand the need for, and that’s the call to Expression.Quote on the predicate expression tree. I’m sure there’s a good reason for it, but this particular example would work without it, as far as I can see. The real implementation uses it though, so dare say it’s required in some way.

EDIT: Daniel’s comment has made this somewhat clearer to me. Each of the arguments to Expression.Call after the MethodInfo itself is meant to be an expression which represents the argument to the method call. In our example we need an expression which represents an argument of type Expression<Func<TSource, bool>>. We already have the value, but we need to provide the layer of wrapping… just as we did with Expression.Constant in the very first expression tree I showed at the top. To wrap the expression value we’ve got, we use Expression.Quote. It’s still not clear to me exactly why we can use Expression.Quote but not Expression.Constant, but at least it’s clearer why we need something

EDIT: I’m gradually getting there. This Stack Overflow answer from Eric Lippert has much to say on the topic. I’m still trying to get my head round it, but I’m sure when I’ve read Eric’s answer several times, I’ll get there.

We can even test that this works, by using the Queryable.AsQueryable method from the real .NET implementation. This creates an IQuerable<T> from any IEnumerable<T> using a built-in query provider. Here’s the test program, where FakeQueryable is a static class containing the extension method above:

using System;
using System.Collections.Generic;
using System.Linq;

class Test
{
    static void Main()
    {        
        List<int> list = new List<int> { 3, 5, 1 };
        IQueryable<int> source = list.AsQueryable();
        IQueryable<int> query = FakeQueryable.Where(source, x => x > 2);
        
        foreach (int value in query)
        {
            Console.WriteLine(value);
        }
    }
}

This works, printing just 3 and 5, filtering out the 1. Yay! (I’m explicitly calling FakeQueryable.Where rather than letting extension method resolution find it, just to make things clearer.)

Um, but what’s doing the actual work? We’ve implemented the Where clause without providing any filtering ourselves. It’s really the query provider which has built an appropriate IQueryable<T> implementation. When we call GetEnumerator() implicitly in the foreach loop, the query can examine everything that’s built up in the expression tree (which could contain multiple operators – it’s nesting queries within queries, essentially) and work out what to do. In the case of our IQueryable<T> built from a list, it just does the filtering in-process… but if we were using LINQ to SQL, that’s when the SQL would be generated. The provider recognizes the specific methods from Queryable, and applies filters, projections etc. That’s why it was important that our demo Where method pretended that the real Queryable.Where had been called – otherwise the query provider wouldn’t know what the call expression

Just to hammer the point home even further… Queryable itself neither knows nor cares what kind of data source you’re using. Its job is not to perform any query operations itself; its job is to record the requested query operations in a source-agnostic manner, and let the source provider handle them when it needs to.

Immediate execution with IQueryProvider.Execute

All the operators using deferred execution in Queryable are implemented in much the same way as our demo Where method. However, that doesn’t cover the situation where we need to execute the query now, because it has to return a value directly instead of another query.

This time I’m going to use ElementAt as the sample, simply because it’s only got one overload, which makes it very easy to grab the relevant MethodInfo. The general procedure is exactly the same as building a new query, except that this time we call the provider’s Execute method instead of CreateQuery.

public static TSource ElementAt<TSource>(this IQueryable<TSource> source, int index)
{
    if (source == null)
    {
        throw new ArgumentNullException("source");
    }
        
    Expression sourceExpression = source.Expression;
    Expression indexExpression = Expression.Constant(index);
        
    MethodInfo method = typeof(Queryable).GetMethod("ElementAt");        
    MethodInfo closedMethod = method.MakeGenericMethod(new Type[] { typeof(TSource) });
        
    // Now we can create a *representation* of this exact method call
    Expression methodCall = Expression.Call(closedMethod, sourceExpression, indexExpression);
        
    // … and ask our query provider to execute it
    return source.Provider.Execute<TSource>(methodCall);
}

The type argument we provide to Execute is the desired return type – so for Count, we’d call Execute<int> for example. Again, it’s up to the query provider to work out what the call actually means.

It’s worth mentioning that both CreateQuery and Execute have generic and non-generic overloads. I haven’t personally encountered a use for the non-generic ones, but I gather they’re useful for various situations in generated code, particularly if you really don’t know the element type – or at least only know it dynamically, and don’t want to have to use reflection to generate an appropriate generic method call.

Transparent support in source code

One of the aspects of LINQ which raises it to the "genius" status (and "slightly scary" at the same time) is that most of the time, most developers don’t need to make any changes to their source code in order to use Enumerable or Queryable. Take this query expression and its translation:

var query = from person in family
            where person.LastName == "Skeet"
            select person.FirstName;

// Translation
var query = family.Where(person => person.LastName == "Skeet")
                  .Select(person => person.FirstName);

Which set of query methods will that use? It entirely depends on the compile-time type of the "family" variable. If that’s a type which implements IQueryable<T>, it will use the extension methods in Queryable, the lambda expression will be converted into expression trees, and the type of "query" will be IQueryable<string>. Otherwise (and assuming the type implements IEnumerable<T> isn’t some other interesting type such as ParallelEnumerable) it will use the extension methods in Enumerable, the lambda expressions will be converted into delgeates, and the type of "query" will be IEnumerable<string>.

The query expression translation part of the specification has no need to care about this, because it’s simply translating into a form which uses lambda expressions – the rest of overload resolution and lambda expression conversion deals with the details.

Genius… although it does mean you need to be careful that really you know where your query evaluation is going to take place – you don’t want to accidentally end up performing your whole query in-process having shipped the entire contents of a database across a network connection…

Conclusion

This was really a whistlestop tour of the "other" side of LINQ – and without going into any of the details of the real providers such as LINQ to SQL. However, I hope it’s given you enough of a flavour for what’s going on to appreciate the general design. Highlights:

  • Expression trees are used to capture logic in a data structure which can be examined relatively easily at execution time
  • Lambda expressions can be converted into expression trees as well as delegates
  • IQueryable<T> and IQueryable form a sort of parallel interface hierarchy to IEnumerable<T> and IEnumerable – although the queryable forms extend the enumerable forms
  • IQueryProvider enables one query to be built based on another, or executed immediately where appropriate
  • Queryable provides equivalent extension methods to most of the Enumerable LINQ operators, except that it uses IQueryable<T> sources and expression trees instead of delegates
  • Queryable doesn’t handle the queries itself at all; it simply records what’s been called and delegates the real processing to the query provider

I think I’ve now covered most of the topics I wanted to mention after finishing the actual Edulinq implementation. Next up I’ll talk about some of the thorny design issues (most of which I’ve already mentioned, but which bear repeating) and then I’ll write a brief "series conclusion" post with a list of links to all the other parts.

C# in Depth 2nd edition: now available in mobi/epub (Kindle) format

I’m not quite sure why this hasn’t been emailed to all existing owners, but the ebook of C# in Depth 2nd edition is now available in mobi and epub form, as well as PDF.

You can download it from the Manning user account site. You need to have the existing ebook first, but if you have the hard copy there should be a voucher in the front which will let you get the ebook for free. (This should work wherever you bought the hard copy from; it doesn’t matter whether you originally ordered it from Manning or not.) If you don’t already have a login for the user account site, just register using the same email address that the ebook was sent to. That way the system automatically credits you with all the ebooks you’ve bought. If you have had ebooks delivered to multiple email addresses, you can add those in the settings page.

Anyway, click on the link to C# in Depth, and you can download the book in any of the listed formats – if you want to use it on a Kindle, just download the mobi file, copy it to the Kindle and you should be well away.

Enjoy!