Currying vs partial function application

This is a slightly odd post, and before you read it you should probably put yourself into one of three buckets:

  • Someone who doesn’t care too much about functional programming, and finds higher order functions tricky: feel free to skip this post entirely.
  • Someone who knows all about functional programming, and already knows the difference between currying and partial function application: please read this post carefully and post comments about any inaccuracies you find. (Yes, the CAPTCHA is broken on Chrome; sorry.)
  • Someone who doesn’t know much about functional programming yet, but is interested to learn more: please take this post with a pinch of salt, and read the comments carefully. Read other articles by more experienced developers for more information.

Basically, I’ve been aware for a while that some people use the terms currying and partial function application somewhat interchangably, when they shouldn’t. It’s one of those topics (like monads) which I feel I understand to some extent, and I’ve decided that the best way of making sure I understand it is to try to write about it. If it helps the topic become more accessible to other developers, so much the better.

This post contains no Haskell

Almost every explanation I’ve ever seen of either topic has given examples in a "proper" functional language, typically Haskell. I have absolutely nothing against Haskell, but I typically find it easier to understand examples in a programming language I understand. I also find it much easier to write examples in a program language I understand, so all the examples in this post are going to be in C#. In fact, it’s all available in a single file – that includes all of the examples, admittedly with a few variables renamed. Just compile and run.

C# isn’t really a functional language – I know just about enough to understand that delegates aren’t really a proper substitute for first class functions. However, they’re good enough to demonstrate the principles involved.

While it’s possible to demonstrate currying and partial function application using a function (method) taking a very small number of parameters, I’ve chosen to use 3 for clarity. Although my methods to perform the currying and partial function application will be generic (so all the types of parameters and return value are arbitrary) I’m using a simple function for demonstration purposes:

static string SampleFunction(int a, int b, int c) 

    return string.Format("a={0}; b={1}; c={2}", a, b, c); 
}

So far, so simple. There’s nothing tricky about that method, so don’t look for anything surprising.

What’s it all about?

Both currying and partial function application are about converting one sort of function to another. We’ll use delegates as an approximation to functions, so if we want to treat the method SampleFunction as a value, we can write:

Func<int, int, int, string> function = SampleFunction;

This single line is useful for two reasons:

  • Assigning the value to a variable hammers home the point that it really is a value. A delegate instance is an object much like any other, and the value of the function variable is a reference just like any other.
  • Method group conversions (using just the name of the method as a way of creating a delegate) doesn’t work terribly nicely with type inference when calling a generic method.

We can already call the function using three arguments with no further work:

string result = function(1, 2, 3);

Or equivalently:

string result = function.Invoke(1, 2, 3);

(The C# compiler just converts the shorthand of the first form to the second. The IL emitted will be the same.)

That’s fine if we’ve got all the arguments available at the same time, but what if we haven’t? To give a concrete (if somewhat contrived) example, suppose we have a logging function with three parameters (source, severity, message) and within a single class (which I’ll call BusinessLogic for the moment) we always want to use the same value for the "source" parameter, and we’d like to be able to log easily everywhere in the class specifying just the severity and message. We have a few options:

  • Create an adapter class which takes the log function (or more generally a logging object) and the "source" value in its constructor, stashes both in instance variables, and exposes a method with two parameters. That method just delegates to the stashed logger, using the source it’s remembered to supply the first argument to the three-parameter method. In BusinessLogic we create an instance of the adapter class, and stash a reference in an instance variable – then just call the two-parameter method everywhere we need to. This is probably overkill if we only need the adapter from BusinessLogic, but it’s reusable… so long as we’re trying to adapt the same logging function.
  • Store the original logger in our BusinessLogic class, but create a helper method with two parameters. This can hard-code the value used for the "source" parameter in one place (the helper method). If we need to do this in several places, it gets annoying.
  • Use a more general functional programming approach – probably partial function application in this case.

I’m deliberately ignoring the discrepancy between holding a reference to "the logger" and holding a reference to "the logging function". Obviously there’s a significant difference if we need to use more than one function from the logging class, but for the purposes of thinking about currying and partial function application, we’ll just think of "a logger" as "a function taking three parameters" (like our sample function).

Now that I’ve given the slightly-real-world concrete example for a bit of motivation, I’m going to ignore it for the rest of the post, and just talk about our sample function. I don’t want to write a whole BusinessLogic class which pretends to do something useful; I’m sure you can perform the appropriate mental conversion from "the sample function" to "something I might actually want to do".

Partial Function Application

The purpose of partial function application is to take a function with N parameters and a value for one of those parameters, and return a function with N-1 parameters, such that calling the result will assemble all the required values appropriately (the 1 argument given to the partial application operation itself, and the N-1 arguments given to the returned function). So in code form, these two calls should be equivalent for our 3-parameter method:

// Normal call
string result1 = function(1, 2, 3);

// Call via partial application
Func<int, int, string> partialFunction = ApplyPartial(function, 1); 
string result2 = partialFunction(2, 3);

In this case I’ve implemented partial application with a single parameter, and chosen the first one – you could write an ApplyPartial method which took more arguments to apply, or which used them somewhere else in the final function execution. I believe that picking off parameters one at a time, from the start, is the most conventional approach.

Thanks to anonymous functions (a lambda expression in this case, but an anonymous method wouldn’t be much more verbose), the implementation of ApplyPartial is simple:

static Func<T2, T3, TResult> ApplyPartial<T1, T2, T3, TResult>
    (Func<T1, T2, T3, TResult> function, T1 arg1) 

    return (b, c) => function(arg1, b, c); 
}

The generics make the method look more complicated than it really is. Note that the lack of higher order types in C# means that you’d need a method like this for every delegate you wanted to use – if you wanted a version for a function which started with four parameters, you’d need an ApplyPartial<T1, T2, T3, T4, TResult> method etc. You’d probably also want a parallel set of methods for the Action delegate family.

The final thing to note is that assuming we had all of these methods, we could perform partial function application again – even potentially down to a parameterless function if we wanted, like this:

Func<int, int, string> partial1 = ApplyPartial(function, 1); 
Func<int, string> partial2 = ApplyPartial(partial1, 2); 
Func<string> partial3 = ApplyPartial(partial2, 3); 
string result = partial3();

Again, only the final line would actually invoke the original function.

Okay, so that’s partial function application. That’s relatively straightforward. Currying is slightly harder to get your head round, in my view.

Currying

Whereas partial function application converts a function with N parameters into a function with N-1 parameters by applying one argument, currying effectively decomposes the function into functions taking a single parameter. We don’t pass any arguments into the Curry method itself:

  • Curry(f) returns a function f1 such that…
  • f1(a) returns a function f2 such that…
  • f2(b) returns a function f3 such that…
  • f3(c) invokes f(a, b, c)

(Again, note that this is specific to our three-parameter function – but hopefully it’s obvious how it would extend to other signatures.)

To give our "equivalence" example again, we can write:

// Normal call
string result1 = function(1, 2, 3);

// Call via currying
Func<int, Func<int, Func<int, string>>> f1 = Curry(function); 
Func<int, Func<int, string>> f2 = f1(1); 
Func<int, string> f3 = f2(2); 
string result2 = f3(3);

// Or to do make all the calls together…
var curried = Curry(function); 
string result3 = curried(1)(2)(3);

The difference between the latter examples shows one reason why functional languages often have good type inference and compact representations of function types: that declaration of f1 is pretty fearsome.

Now that we know what the Curry method is meant to do, it’s actually surprisingly simple to implement. Indeed, all we need to do is translate the bullet points above into lambda expressions. It’s a thing of beauty:

static Func<T1, Func<T2, Func<T3, TResult>>> Curry<T1, T2, T3, TResult> 
    (Func<T1, T2, T3, TResult> function) 

    return a => b => c => function(a, b, c); 
}

If you want to add some brackets to make it clearer for you, feel free – personally I think it just adds clutter. Either way, we get what we want. (It’s worth thinking about how annoying it would be to write that without lambda expressions or anonymous methods. Not hard, just annoying.)

So that’s currying. I think. Maybe.

Conclusion

I can’t say I’ve ever knowingly used currying, whereas I suspect some bits of Noda Time‘s text parsing effectively use partial functional application. (If anyone really wants me to dig in and check, I’ll do so.)

I really hope I’ve got the difference between them right here – it feels right, in that the two are clearly related, but also quite distinct. Now that we’ve reached the end, think about how that difference manifests itself when there are only two parameters, and hopefully you’ll see why I chose to use three :)

My gut feeling is that currying is a more useful concept in an academic context, whereas partial functional application is more useful in practice. However, that’s the gut feeling of someone who hasn’t really used a functional language in anger. If I ever really get round to using F#, maybe I’ll do a follow-up post. For now, I’m hoping that my trusty smart readers can provide useful thoughts for others.

Eduasync part 19: ordering by completion, ahead of time…

Today’s post involves the MagicOrdering project in source control (project 28).

When I wrote part 16 of Eduasync, showing composition in the form of majority voting, one reader mailed me a really interesting suggestion. We don’t really need to wait for any of the tasks to complete on each iteration of the loop – we only need to wait for the next task to complete. Now that sounds impossible – sure, it’s great if we know the completion order of the tasks, but half the point of asynchrony is that many things can be happening at once, and we don’t know when they’ll complete. However, it’s not as silly as it sounds.

If you give me a collection of tasks, I’ll give you back another collection of tasks which will return the same results – but I’ll order them so that the first returned task will have the same result as whichever of your original tasks completes first, and the second returned task will have the same result as whichever of your original tasks completes second, and so on. They won’t be the same tasks as you gave me, reordered – but they’ll be tasks with the same results. I’ll propagate cancellation, exceptions and so on.

It still sounds impossible… until you realize that I don’t have to associate one of my returned tasks with one of your original tasks until it has completed. Before anything has completed, all the tasks look the same. The trick is that as soon as I see one of your tasks complete, I can fetch the result and propagate it to the first of the tasks I’ve returned to you, using TaskCompletionSource<T>. When the second of your tasks completes, I propagate the result to the second of the returned tasks, etc. This is all quite easy using Task<T>.ContinueWith – barring a few caveats I’ll mention later on.

Once we’ve built a method to do this, we can then really easily build a method which is the async equivalent of Parallel.ForEach (and indeed you could write multiple methods for the various overloads). This will execute a specific action on each task in turn, as it completes… it’s like repeatedly calling Task.WhenAny, but we only actually need to wait for one task at a time, because we know that the first task in our "completion ordered" collection will be the first one to complete (duh).

Show me the code!

Enough description – let’s look at how we’ll demonstrate both methods, and then how we implement them.

private static async Task PrintDelayedRandomTasksAsync()
{
    Random rng = new Random();
    var values = Enumerable.Range(0, 10).Select(_ => rng.Next(3000)).ToList();
    Console.WriteLine("Initial order: {0}", string.Join(" ", values));

    var tasks = values.Select(DelayAsync);

    var ordered = OrderByCompletion(tasks);

    Console.WriteLine("In order of completion:");
    await ForEach(ordered, Console.WriteLine);
}

/// <summary>
/// Returns a task which delays (asynchronously) by the given number of milliseconds,
/// then return that same number back.
/// </summary>
private static async Task<int> DelayAsync(int delayMillis)
{
    await TaskEx.Delay(delayMillis);
    return delayMillis;
}

The idea is that we’re going to create 10 tasks which each just wait for some random period of time, and return the same time period back. We’ll create them in any old order – but obviously they should complete in (at least roughly) the same order as the returned numbers.

Once we’ve created the collection of tasks, we’ll call OrderByCompletion to create a second collection of tasks, returning the same results but this time in completion order – so ordered.ElementAt(0) will be the first task to complete, for example.

Finally, we call ForEach and pass in the ordered task collection, along with Console.WriteLine as the action to take with each value. We await the resulting Task to mimic blocking until the foreach loop has finished. Note that we could make this a non-async method and just return the task returned by ForEach, given that that’s our only await expression and it’s right at the end of the method. This would be marginally faster, too – there’s no need to build an extra state machine. See Stephen Toub’s article about async performance for more information.

ForEach

I’d like to get ForEach out of the way first, as it’s so simple: it’s literally just iterating over the tasks, awaiting them and propagating the result to the action. We get the "return a task which will wait until we’ve finished" for free by virtue of making it an async method.

/// <summary>
/// Executes the given action on each of the tasks in turn, in the order of
/// the sequence. The action is passed the result of each task.
/// </summary>
private static async Task ForEach<T>(IEnumerable<Task<T>> tasks, Action<T> action)
{
    foreach (var task in tasks)
    {
        T value = await task;
        action(value);
    }
}

Simple, right? Let’s get onto the meat…

OrderByCompletion

This is the tricky bit, and I’ve actually split it into two methods to make it slightly easier to comprehend. The PropagateResult method feels like it could be useful in other composition methods, too.

The basic plan is:

  • Copy the input tasks to a list: we need to work out how many there are and iterate over them, so let’s make sure we only iterate once
  • Create a collection of TaskCompletionSource<T> references, one for each input task. Note that we’re not associating any particular input task with any particular completion source – we just need the same number of them
  • Declare an integer to keep track of "the next available completion source"
  • Attach a continuation to each input task which will be increment the counter we’ve just declared, and propagate the just-completed task’s status
  • Return a view onto the collection of TaskCompletionSource<T> values, projecting each one to its Task property

Once you’re happy with the idea, the implementation isn’t too surprising (although it is quite long):

/// <summary>
/// Returns a sequence of tasks which will be observed to complete with the same set
/// of results as the given input tasks, but in the order in which the original tasks complete.
/// </summary>
private static IEnumerable<Task<T>> OrderByCompletion<T>(IEnumerable<Task<T>> inputTasks)
{
    // Copy the input so we know it’ll be stable, and we don’t evaluate it twice
    var inputTaskList = inputTasks.ToList();

    // Could use Enumerable.Range here, if we wanted…
    var completionSourceList = new List<TaskCompletionSource<T>>(inputTaskList.Count);
    for (int i = 0; i < inputTaskList.Count; i++)
    {
        completionSourceList.Add(new TaskCompletionSource<T>());
    }

    // At any one time, this is "the index of the box we’ve just filled".
    // It would be nice to make it nextIndex and start with 0, but Interlocked.Increment
    // returns the incremented value…
    int prevIndex = -1;

    // We don’t have to create this outside the loop, but it makes it clearer
    // that the continuation is the same for all tasks.
    Action<Task<T>> continuation = completedTask =>
    {
        int index = Interlocked.Increment(ref prevIndex);
        var source = completionSourceList[index];
        PropagateResult(completedTask, source);
    };

    foreach (var inputTask in inputTaskList)
    {
        // TODO: Work out whether TaskScheduler.Default is really the right one to use.
        inputTask.ContinueWith(continuation,
                               CancellationToken.None,
                               TaskContinuationOptions.ExecuteSynchronously,
                               TaskScheduler.Default);
    }

    return completionSourceList.Select(source => source.Task);
}

/// <summary>
/// Propagates the status of the given task (which must be completed) to a task completion source
/// (which should not be).
/// </summary>
private static void PropagateResult<T>(Task<T> completedTask,
    TaskCompletionSource<T> completionSource)
{
    switch (completedTask.Status)
    {
        case TaskStatus.Canceled:
            completionSource.TrySetCanceled();
            break;
        case TaskStatus.Faulted:
            completionSource.TrySetException(completedTask.Exception.InnerExceptions);
            break;
        case TaskStatus.RanToCompletion:
            completionSource.TrySetResult(completedTask.Result);
            break;
        default:
            // TODO: Work out whether this is really appropriate. Could set
            // an exception in the completion source, of course…
            throw new ArgumentException("Task was not completed");
    }
}

You’ll notice there are a couple of TODO comments there. The exception in PropagateResult really shouldn’t happen – the continuation shouldn’t be called when the task hasn’t completed. I still need to think carefully about how tasks should propagate exceptions though.

The arguments to ContinueWith are more tricky: working through my TimeMachine class and some unit tests with Bill Wagner last week showed just how little I know about how SynchronizationContext, the task awaiters, task schedulers, and TaskContinuationOptions.ExecuteSynchronously all interact. I would definitely need to look into that more deeply before TimeMachine was really ready for heavy use… which means you should probably be looking at the TPL in more depth too.

Conclusion

Sure enough, when you run the code, the results appear in order, as the tasks complete. Here’s one sample of the output:

Initial order: 335 468 1842 1991 2512 2603 270 2854 1972 1327
In order of completion:
270
335
468
1327
1842
1972
1991
2512
2603
2854

TODOs aside, the code in this post is remarkable (which I can say with modesty, as I’ve only refactored it from the code sent to me by another reader and Stephen Toub). It makes me smile every time I think about the seemingly-impossible job it accomplishes. I suspect this approach could be useful in any number of composition blocks – it’s definitely one to remember.

Coding in the style of Glee

As previously mentioned, at CodeMash 2012 I gave a very silly Pecha Kucha talk entitled "Coding in the style of Glee". The video is on YouTube, or can be seen embedded below:

(There’s also another YouTube video from a different angle.)

This post gives the 20 slides (which were just text; no fancy pictures unlike my competitors) and what I meant to say about them. (Edited very slightly to remove a couple of CodeMash-specific in-jokes.) Don’t forget that each slide was only up for 20 seconds.

Coding in the style of Glee

As you may know, I’m from the UK, and it’s wonderful to be here. This is my first US conference, so it’s great to be in the country which has shared with the world its most marvellous cultural output: the Fox show, Glee.

At first I watched it just for surface story – but now I know better – I know that really, the songs are all about the culture and practice of coding.

(It isn’t easy) Bein’ Green

When I started coding, it was on a ZX Spectrum, in Basic. It was hard, but the computer came with a great manual. I later learned C from a ringbinder of some course or other – and entirely failed to understand half the language. Of course, this was before Stack Overflow, when it was really hard being a newbie – where could you turn for information?

Getting to know you

Over time I became semi-competent in C, with the help of friends. But learning is a constant process, of course – getting to know new languages and platforms is just part of a good dev’s life every day.

Learning itself is a skill – how similar it is to getting to know small children, I leave to your imagination.

Man in the Mirror

Glee doesn’t just talk about the coding experience, of course – it talks about specific technologies. This Michael Jackson song is talking about reflection, of course. Although the idea wasn’t new in Java, it was new to me – and now it would be almost unthinkable to come up with a new platform which didn’t let you find out about what was in the code.

Bridge over Troubled Water

Another technology covered beautifully by Glee is the interop. We’re in a big world, and we always need to talk to other systems. Whether it’s over JNI (heaven help you), P/Invoke, SOAP, REST – whatever, I hope next time you connect to another system, you’ll hear this haunting Simon and Garfunkel melody in the background.

I will survive

And who could forget persistence frameworks. I’m not sure whether Gloria Gaynor had Hibernate and the Entity Framework in mind when she sang this, but I’m utterly convinced that the Glee writers did. When you submit your data, it’s just got to survive – what else would you want?

You can’t always get what you want

We’d all like perfect specifications, reliable libraries, ideal languages, etc – but that’s just not going to happen. It’s possible that of course you won’t get what you need – even if you try real hard. But hey, you might just.

Lean on me (or Agile on me)

(I didn’t actually have notes written for this one. Copied from the video.)

Glee sympathizes with you – but it also have a bit of an answer: lean on me. Lean and agile development, so we can adapt to constantly changing specifications, and eventually we will have something that is useful. Maybe nothing like what we initially envisaged, but it will be something useful.

Losing my Religion

Of course, we don’t always stay in love with a platform. I’d like to dedicate this slight to Enterprise Java. Fortunately I never had to deal with Enterprise Java Beans, but I “enjoyed” enough other J2EE APIs to make me yearn for a world without BeanProcessorFactoryFactories.

Anything Goes

Now I’m pretty conservative – only in terms of coding, mind you. I’m a statically typed language guy. But Glee celebrates dynamic languages too – languages where really, anything goes until you try to execute it. Even though I haven’t gone down the dynamic route myself much, it’s important that we all welcome the diversity of languages and platforms we have.

Get Happy

Along with the rise of dynamic languages, we seem to have seen a rise of happy developers. We’ve always had enthusiastic developers, but there’s a certain air about your typical Ruby on Rails developer which feels new to me. Again, I’m not a Ruby fan myself – but it’s always nice to see other happy people, and maybe one day I’ll see the light.

Bust your Windows

I don’t know what I can say about this song. Do the Glee writers have it in for Microsoft? I don’t remember “Bust your OSX” or “Bust your Linux” for example. Only Windows is targeted here.

The Safety Dance

One big change for me since joining Google is increased awareness of the need for redundancy – the intricate dance we need to perform to create a service which will stay up no matter what. Where redundancy is a dirty word in most of industry, as developers we celebrate it – and will do anything we can to avoid…

The Only Exception

… a single point of failure.

(Yes, that really is all I’d prepared for that slide. Hence the need for improvisation.)

Telephone

(From video.) Glee celebrates the rise of phone apps. Who these days could be unaware of the importance of the development of mobile applications? And obviously, we can credit the iPhone for that, but since the iPhone, and just smart phone apps, we’ve also started…

U Can’t Touch This

(From video.) Tablets! And touch screen devices of all kinds. So Windows 8 – very touch-based, and sooner or later we’re all going to have to get with it. I don’t do UIs, I’ve never done a touch UI in my life, I have no idea how it works. But clearly it’s one of the ways forward.

Forget You

As smart phones and tablets become more ubiquitous and more bound to us as people, privacy has become more important. Glee gave us a timely reminder of the reverse of the persistence early on: we need to be able to forget about users, as well as remember them.

(A)waiting for a girl like you

(From video.) I’d like to leave on an up-note, so: I’m clearly very, very excited (really, really excited) about C# 5 and its await keyword so I ask you – I beg you – be excited about development. And always bear in mind your users.

My life would suck without you

Users rule our world. Can’t live without them, can’t shoot ‘em.

Celebrate – we do stuff to make users really happy! This is awesome! We should be thrilled!

(Even for enterprise apps, we’re doing useful stuff. Honest.)

Don’t stop believing

(From video.) So to sum up: have fun, keep learning, really, really enjoy what you’re doing, and… don’t stop!

CodeMash 2012 report

I’m nearly home – on a bus back from Heathrow airport to Reading – returning from CodeMash 2012. This was my first US conference, and I had a wonderful time. It was pretty densely packed in terms of presenting / recording for me:

  • I presented two back-to-back sessions jointly with Bill Wagner, on async. These went down really well (particularly Bill’s genius idea of using the Doctor Who quote about time being a "big ball of wibbly-wobbly, timey-wimey stuff") and were great fun to give. Bill’s a class act, and I think we got the balance between use and underpinnings about right.
  • I recorded a podcast with Scott Hanselman (we were going to record two, but the first one ended up being longer than expected)
  • I presented a talk on "C#’s Greatest Mistakes" which ended up being somewhere between a discussion on language design, and a demonstration of surprising "features" of C#. It overran by 15 minutes without me coming close to running out of things to say, but hopefully it was useful. It was a somewhat rambly session, but at least I warned folks of that up-front. It would be nice to be able to present the same sort of material in a really "tight" way, but I’m just not sure how to.
  • I gave a 20×20 "Pecha Kucha" talk called "Coding in the style of Glee" as the silliest topic I could come up with on short notice. This was absolutely terrifying and extremely silly. I only came third in the contest (and the winner, Leon, was simply phenomenal) but I was happy that I’d only embarrassed myself about as much as I’d expected to. The YouTube video of this is already up, and I’ll write a blog post with the slide titles and what I was trying to say in them :)

Unfortunately due to last minute async prep and desperately trying to cobble together slides for the Glee talk, I didn’t have time to actually attend as many talks as I’d have liked. Even though I was present for the whole of the Scala Koans session in the PreCompilr on Wednesday, I found myself next to Bruce Eckel, and ended up chatting with him for most of the time. It would have been a bit of a waste not to, really. (And at least some of that talking was Scala-related…) I also watched the whole of the SignalR presentation by Brady Gaster – where I was apparently the only person in the room with an aversion to "dynamic" in C# 4. That made me the butt of a few jokes, but not too many for comfort.

In terms of C#-related talks, I went to the first half of Dustin Campbell’s Roslyn session, but was somewhat distracted by putting together Glee slides and had to leave half way through to hand them in. My final session of the conference was Bill Wagner’s "Stunt coding in C# – I dare you to try this at home" which was excellent, and a very fitting end to the conference for me.

Highlights of the conference for me:

  • Messing with Bill Wagner’s code at the end of not just our joint async talk but also his Stunt Coding talk. I’ve never before asked a presented whether they mind me just stealing the keyboard, but I was confident that Bill (and the attendees) would get a kick out of it – and the code was nicer afterwards :)
  • Meeting so many people… some that I’ve met before (I hadn’t seen Ted Neward since I gate-crashed a party at his house after the MVP 2005 Summit), some I’d met virtually but not physically before (like Bill) and there loads of other folks I’d never known at all before – including Cori Drew. Cori simply seemed to pop up everywhere – I swear she had about 20 clones at CodeMash. (She also recorded the video of the Glee talk, and it’s her laughter you can hear – thanks very much, Cori!) Everyone at the conference was incredibly friendly, and I was really touched by how many people said on the last day that they’d appreciated me making the long trip.
  • Confounding Dustin Campbell and Kevin Pilch-Bisson with my evil generic overloading puzzle. Just to be clear, these are two seriously smart guys and this was a friendly over-lunch challenge. It’s always a privilege to meet more of the team responsible for C# and Visual Studio.
  • The number of families who came – this is something I’ve never seen at other conferences, and it really made a difference in terms of the atmosphere of the non-dev bits. It was fabulous to see the kids in the water park, for example. Even out of just attendees, there was a greater proportion of women at CodeMash than at other conferences I’ve been to – obviously still vastly outnumbered by the men, but it was nice to see some improvement on that front.

This will probably be my only international conference for 2012, so it’s a good job that it was so wonderfully organized. I really hope I have the chance to attend next year too. Many thanks to everyone who helped make it such a special conference.

Eduasync part 18: Changes between the Async CTP and the Visual Studio 11 Preview

In preparation for CodeMash, I’ve been writing some more async code and decompiling it with Reflector. This time I’m using the Visual Studio 11 Developer Preview – the version which installs alongside Visual Studio 2010 under Windows 7. (Don’t ask me about any other features of Visual Studio 11 – I haven’t explored it thoroughly; I’ve really only used it for the C# 5 bits.)

There have been quite a few changes since the CTP – they’re not visible changes in terms of code that you’d normally write, but the state machine generated by the C# compiler is reasonably different. In this post I’ll describe the differences, as best I understand them. There are still a couple of things I don’t understand (which I’ll highlight within the post) but overall, I think I’ve got a pretty good handle on why the changes have been made.

I’m going to assume you already have a reasonable grasp of the basic idea of async and how it works – the way that the compiler generates a state machine to represent an async method or anonymous function, with originally-local variables being promoted to instance variables within the state machine, etc. If the last sentence was a complete mystery to you, see Eduasync part 7 for more information. I don’t expect you to remember the exact details of what was in the previous CTP though :)

Removal of iterator block leftovers

In the CTP, the code for async methods was based on the iterator block implementation. I suspect that’s still the case, but possibly sharing just a little less code. There used to be a few methods and fields which weren’t used in async methods, but now they’re gone:

  • There’s no now constructor, so no need for the "skeleton" method which replaces the real async method to pass in 0 as the initial state.
  • There’s no Dispose method.
  • There’s no disposing field.

It’s nice to see these gone, but it’s not terribly interesting. Now on to the bigger changes…

Large structural changes

There’s a set of related structural changes which don’t make sense individually. I’ll describe them first, then look at how it all hangs together, and my guess as to the reasoning behind.

The state machine is now a struct

The declaration of the nested type for the state machine is now something like this:

[CompilerGenerated]
private struct StateMachine : IStateMachine
{
    // Fields common to all async state machines
    // (with caveats)
    private int state;
    private object awaiter;
    public AsyncTaskMethodBuilder<int> builder;
    public Action moveNextDelegate;
    private object stack;

    // Hoisted local variables

    // Methods
    [DebuggerHidden]
    public void SetMoveNextDelegate(Action action) { … }
    public void MoveNext() { … }
}

The caveats around the common field are in terms of the return type of the async method (which determines the type of builder used) and whether or not there are any awaits (if there are no awaits, the stack and awaiter fields aren’t generated).

Note that throughout this blog post I’ve changed the names of fields and types – in reality they’re all "unspeakable" names including angle-brackets, just like all compiler-generated names.

There’s a new assembly-wide interface

As you can see from the code above, the state machine implements an interface (actually called <>t__IStateMachine). One of these is created in the global namespace in each assembly that contains at least one async method or anonymous function, and it looks like this:

[CompilerGenerated]
internal interface IStateMachine
{
    void SetMoveNextDelegate(Action action);
}

The implementation for this method is always the same, and it’s trivial:

[DebuggerHidden]
public void SetMoveNextDelegate(Action action)
{
    this.moveNextDelegate = action;
}

Simplified skeleton method

The method which starts the state machine, which I’ve been calling the "skeleton" method everywhere, is now a bit simpler than it was. Something like this:

public static Task<int> FooAsync()
{
    StateMachine machine = new StateMachine();
    machine.builder = AsyncVoidMethodBuilder.Create();
    machine.MoveNext();
    return machine.builder.Task;
}

In fact if you decompile the IL, you’ll see that it doesn’t explicitly initialize the variable to start with – it just declares it, sets the builder field and then calls MoveNext(). That’s not valid C# (as all the struct’s fields aren’t initialized), but it is valid IL. It’s equivalent to the code above though. Note how there’s nothing to set the continuation – where previously the moveNextDelegate field would be populated within the skeleton method.

Just-in-time delegate creation

Now that the skeleton method doesn’t create the delegate representing the continuation, it can be done when it’s first required – which is when we first encounter an await expression for an awaitable which hasn’t already completed. (If the awaitable has completed before we await it, the generated code skips the continuation and just uses the results immediately and synchronously).

The code for that delegate creation is slightly trickier than you might expect, however. It looks something like this:

Action action = this.moveNextDelegate;
if (action == null)
{
    Task<int> task = this.builder.Task;
    action = new Action(this.MoveNext);
    ((IStateMachine) action.Target).SetMoveNextDelegate(action);
}

There are two oddities here, one of which I mostly understand and one of which I don’t understand at all.

I really don’t understand the "task" variable here. Why do we need to exercise the AsyncTaskMethodBuilder.Task property? We don’t use the result anywhere… does forcing this flush some memory buffer? I have no clue on this one. (See the update at the bottom of the post…)

The part about setting the delegate via the interface makes more sense, but it’s subtle. You might expect code like this:

// Looks sensible, but is actually slightly broken
Action action = this.moveNextDelegate;
if (action == null)
{
    action = new Action(this.MoveNext);
    this.moveNextDelegate = action;
}

That would sort of work – but we’d end up needing to recreate the delegate each time we encountered an appropriate await expression. Although the above code saves the value to the field, it saves it within the current value of the state machine… after we’ve boxed that value as the target of the delegate. The value we want to mutate is the one within the box – which is precisely why there’s an interface, and why the code casts to it.

We can’t even just unbox and then set the field afterwards – at least in C# – because the unbox operation is always followed by a copy operation in normal C#. I believe it would be possible for the C# compiler to generate IL which unboxed action.Target without the copy, and then set the field in that. It’s not clear to me why the team went with the interface approach instead… I would expect that to be slower (as it requires dynamic dispatch) but I could easily be wrong. Of course, it would also make it impossible to decompile the IL to C#, which would make my talks harder, but don’t expect the C# team to bend the compiler implementation for my benefit ;)

(As an aside to all of this, I’ve gone back and forth on whether the "slightly broken" implementation would recreate the delegate on every appropriate await, or only two. I think it would end up being on every occurrence, as even though on the second occurrence we’d be operating within the context of the first boxed instance, the new delegate would have a reference to a new boxed copy each time. It does my head in a little bit, trying to think about this… more evidence that mutable structs are evil and hard to reason about. It’s not the wrong decision in this case, hidden far from the gaze of normal developers, but it’s a pain to reason about.)

Single awaiter variable

In the CTP, each await expression generated a separate field within the state machine, and that field was always of the exact awaiter type. In the VS11 Developer Preview, there’s always exactly one awaiter field (assuming there’s at least one await expression) and it’s always of type object. It’s used like this:

  // Single local variable used by both continuation and first-time paths
  TaskAwaiter<int> localAwaiter;

  …

  if (conditions-for-first-time-execution)
  {
      // Code before await

      localAwaiter = task.GetAwaiter();
      if (localAwaiter.IsCompleted)
      {
          goto Await1Completed;
      }
      this.state = 1;
      TaskAwaiter<int>[] awaiterArray = { localAwaiter };
      this.awaiter = awaiterArray;
      // Lazy delegate creation goes here
      awaiterArray[0].OnCompleted(action);
      return;
  }
  // Continuation would get into here
  localAwaiter = ((TaskAwaiter<int>[]) this.awaiter)[0];
  this.awaiter = null;
  this.state = 0;
Await1Completed:
  int result = localAwaiter.GetResult(); 
  localAwaiter = default(TaskAwaiter<int>);

I realize there’s a lot of code here, but it does make some sense:

  • The value of the awaiter field is always either null, or a reference to a single-element array of the awaiter type for one of the await expressions.
  • A single localAwaiter variable is shared between the two code paths, populated either from the awaitable (on the initial code path) or by copying the value from the array (in the second code path).
  • The field is always set to null and the local variable is set to its default value after use, presumably for the sake of garbage collection

It’s basically a nice way of using the fact that we’ll only ever need one awaiter at a time. It’s not clear to me why an array is used instead of either using a reference to the awaiter for class-based awaiters, or simply by boxing for struct-based awaiters. The latter would need the same "unbox without copy" approach discussed in the previous section – so if there’s some reason why that’s actually infeasible, it would explain the use of an array here. We can’t use the interface trick in this case, as the compiler isn’t in control of the awaiter type (so can’t make it implement an interface).

Expression stack preservation

This one is actually a fix to a bug in the async CTP, which I’ve written about before. We’re used to the stack containing our local variables (in the absence of iterator blocks, captured variables etc, and modulo the stack being an implementation detail) but it’s also used for intermediate results within a single statement. For example, consider this block of code:

int x = 10;
int y = 5;
int z = x + 50 * y;

That last line is effectively:

  • Load the value of x onto the stack
  • Load the value 50 onto the stack
  • Load the value of y onto the stack
  • Multiply the top two stack values (50 and y) leaving the result on the stack
  • Add the top two stack values (x and the previously-computed result) leaving the result on the stack
  • Store the top stack value into z

Now suppose we want to turn y into a Task<int>:

int x = 10;
Task<int> y = Task.FromResult(5);
int z = x + 50 * await y;

Our state machine needs to make sure that it will preserve the same behaviour as the synchronous version, so it needs the same sort of stack. In the new-style state machine, all of that stack is saved in the "stack" field. It’s only one field, but may need to represent multiple different types within the code at various different await expressions – in the code above, for example, it represents two int values. As far as I can discover, the C# compiler generates code which uses the actual type of the value it needs, if it only requires a single value. If it needs multiple values, it uses an appropriate Tuple type, nesting tuples if it goes beyond the number of type parameters supported by the Tuple<…> family of types. So in our case above, we end up with code a bit like this:

// Local variable used in both paths
Tuple<int, int> tuple;

// Code before the await
if (conditions-for-first-time-execution) 

    …
    tuple = new Tuple<int, int>(this.x, 50);
    this.stack = tuple;
    …
}

// Continuation would get into here
tuple = (Tuple<int, int>) this.stack;
// IL copies the values from the tuple onto the stack at this point
this.stack = null;

// Both the fast and slow code paths get here eventually
this.z = stack0 + stack1 * awaiter.GetResult()

I say it’s a bit like this, because it’s hard to represent the IL exactly in C# in this case. The tuple is only created if it’s needed, i.e. not in the already-completed fast path. In that case, the values are loaded onto the stack but not then put into the tuple – execution skips straight to the code which uses the values already on the stack.

When the awaitable isn’t complete immediately, then a Tuple<int, int> is created, stored in the "stack" field, and the continuation is handed to the awaiter. On continuation, the tuple is loaded back from the "stack" field (and cast accordingly), the values are loaded onto the stack – and then we’re back into the common code path of fetching the value and performing the add and multiply operations.

Conclusion

As far as I’m aware, those are the most noticeable changes in the generated code. There may well still be a load more changes in Task<T> and the TPL in general – I wouldn’t be at all surprised – but that’s harder to investigate.

I’m sure all of this has been done in the name of performance (and correctness, in the case of stack preservation). The state machine is now much smaller in terms of the number of fields it requires, and objects are created locally as far as possible (including the state machine itself only requiring heap allocation if there’s ever a "slow" awaitable). I suspect there’s still some room for optimization, however:

  • Both the awaiter and the delegate use careful boxing and either arrays or a mutating interface to allow the boxed value to be changed. I suspect that using unbox with the concrete type, but without copying the value, would be more efficient. I may attempt to work this theory up into a test at some point.
  • If there’s only one awaiter type (usually TaskAwaiter<T> for some T), that type could be used instead of object, potentially reducing heap optimization
  • I’ve no idea why the builder.Task property is explicitly fetched and then the results discarded
  • If there’s only one await expression, the "stack" field can be strongly typed, which would also avoid boxing if only a single value needs to be within that stack
  • The stack field could be removed entirely when it’s not needed for intermediate stack value preservation. (I believe that would be the case reasonably often.)

The use of mutable value types is really fascinating (for me, at least) – I’m sure most people on the C# team would still say they’re evil, but when they’re used in a carefully controlled environment where real developers don’t have to reason about their behaviour, they can be useful.

Next time, I’ll hopefully get back to the idea I promised to write up before, about ordering a collection of tasks in completion order… before they’ve completed. (Well, sort of.) Should be fun…

Update (January 16th 2012)

Stephen Toub got in touch with me after I posted the original version of this blog entry, to explain the use of the Task property. Apparently the idea is that at this point, we know we’re going to return out of the state machine, so the skeleton method is going to access the Task property anyway. However, as we haven’t scheduled the continuation yet we also know that nothing will be accessing the Task property on a different thread. If we access it now for the first time, we can lazily allocate the task in the same thread that created the AsyncMethodBuilder, with no risk of contention. If we can force there to be a task ready and waiting for whatever accesses it later, we don’t need any synchronization in that area.

So why might we want to allocate the task lazily in the first place? Well, don’t forget that we might never have to wait for an await (as it were). We might just have an async method which takes the fast path everywhere. If that’s the case, then for certain cases (e.g. a non-generic, successfully completed task, or a Task<bool> which again has completed successfully) we can reuse the same instance repeatedly. Apparently this laziness isn’t yet part of the VS11 Developer Preview, but the reason for the property access is in preparation for this.

Another case of micro-optimization – which is fair enough when it’s at a system level :)

Awaiting CodeMash 2012

Happy New Year, everyone!

I’m attempting to make 2012 a quiet year in terms of my speaking engagements – I’ve turned down a few kind offers already, and I expect to do so again during the year. I may well still give user group talks in evenings if I can do so without having to take holiday, but full conferences are likely to be out, especially international ones. This is partly so I can take more time off to support my wife, Holly, who has her own books to promote. This year will be particularly important for Holly as she’s one of the World Book Day 2012 authors – I’m tremendously proud of her, as you can no doubt imagine.

However, there’s one international conference I decided to submit proposals for: CodeMash. I’ve never been to this or any other US conference, but I’ve heard fabulous things about it. I’m particularly excited that I’ll be able to present alongside Bill Wagner, a fellow C# author (probably most famous for Effective C# which I’ve reviewed before now). Bill and I have never met, although we’ve participated jointly on a .NET Rocks show before now. I could barely hear Bill when we recorded that though, so it hardly counts :)

The conference schedule for CodeMash shows Bill and I each giving two talks: two individual ones on general C# (C# Stunt Coding by Bill, and C#’s Greatest Mistakes by me) and two sessions on the async support in C# 5… async "from the inside" and "from the outside". Although these have hitherto been shown as separate sessions, everyone involved thought it would make more sense to weave the two together… so this will be a double-length session. Bill will be presenting the "outside" view – how to use async, basically; I’ll be presenting the "inside" view – how it all hangs together behind the scenes.

With any luck, this will be much more helpful to the conference attendees, as they should be able to build up confidence in the solid foundations underpinning it all at the same time as seeing how fabulously useful it’ll be for developers. It also means that Bill and I can bounce ideas off each other spontaneously as we go – I intend to pay close attention and learn a thing or two myself!

It’s pretty much impossible to predict how it’ll all hang together, but I’m really excited about the whole shebang. I’ll be fascinated to see if and how US conferences differ from the various ones this side of the pond… but it does make the whole thing that bit more nerve-wracking. If you’re coming to CodeMash, please grab me and say hi – it never hurts to see a friendly face…

(Note: Bill has a similar blog post posted just before this one.)