Category Archives: C#

Eduasync part 18: Changes between the Async CTP and the Visual Studio 11 Preview

In preparation for CodeMash, I’ve been writing some more async code and decompiling it with Reflector. This time I’m using the Visual Studio 11 Developer Preview – the version which installs alongside Visual Studio 2010 under Windows 7. (Don’t ask me about any other features of Visual Studio 11 – I haven’t explored it thoroughly; I’ve really only used it for the C# 5 bits.)

There have been quite a few changes since the CTP – they’re not visible changes in terms of code that you’d normally write, but the state machine generated by the C# compiler is reasonably different. In this post I’ll describe the differences, as best I understand them. There are still a couple of things I don’t understand (which I’ll highlight within the post) but overall, I think I’ve got a pretty good handle on why the changes have been made.

I’m going to assume you already have a reasonable grasp of the basic idea of async and how it works – the way that the compiler generates a state machine to represent an async method or anonymous function, with originally-local variables being promoted to instance variables within the state machine, etc. If the last sentence was a complete mystery to you, see Eduasync part 7 for more information. I don’t expect you to remember the exact details of what was in the previous CTP though :)

Removal of iterator block leftovers

In the CTP, the code for async methods was based on the iterator block implementation. I suspect that’s still the case, but possibly sharing just a little less code. There used to be a few methods and fields which weren’t used in async methods, but now they’re gone:

  • There’s no now constructor, so no need for the "skeleton" method which replaces the real async method to pass in 0 as the initial state.
  • There’s no Dispose method.
  • There’s no disposing field.

It’s nice to see these gone, but it’s not terribly interesting. Now on to the bigger changes…

Large structural changes

There’s a set of related structural changes which don’t make sense individually. I’ll describe them first, then look at how it all hangs together, and my guess as to the reasoning behind.

The state machine is now a struct

The declaration of the nested type for the state machine is now something like this:

[CompilerGenerated]
private struct StateMachine : IStateMachine
{
    // Fields common to all async state machines
    // (with caveats)
    private int state;
    private object awaiter;
    public AsyncTaskMethodBuilder<int> builder;
    public Action moveNextDelegate;
    private object stack;

    // Hoisted local variables

    // Methods
    [DebuggerHidden]
    public void SetMoveNextDelegate(Action action) { … }
    public void MoveNext() { … }
}

The caveats around the common field are in terms of the return type of the async method (which determines the type of builder used) and whether or not there are any awaits (if there are no awaits, the stack and awaiter fields aren’t generated).

Note that throughout this blog post I’ve changed the names of fields and types – in reality they’re all "unspeakable" names including angle-brackets, just like all compiler-generated names.

There’s a new assembly-wide interface

As you can see from the code above, the state machine implements an interface (actually called <>t__IStateMachine). One of these is created in the global namespace in each assembly that contains at least one async method or anonymous function, and it looks like this:

[CompilerGenerated]
internal interface IStateMachine
{
    void SetMoveNextDelegate(Action action);
}

The implementation for this method is always the same, and it’s trivial:

[DebuggerHidden]
public void SetMoveNextDelegate(Action action)
{
    this.moveNextDelegate = action;
}

Simplified skeleton method

The method which starts the state machine, which I’ve been calling the "skeleton" method everywhere, is now a bit simpler than it was. Something like this:

public static Task<int> FooAsync()
{
    StateMachine machine = new StateMachine();
    machine.builder = AsyncVoidMethodBuilder.Create();
    machine.MoveNext();
    return machine.builder.Task;
}

In fact if you decompile the IL, you’ll see that it doesn’t explicitly initialize the variable to start with – it just declares it, sets the builder field and then calls MoveNext(). That’s not valid C# (as all the struct’s fields aren’t initialized), but it is valid IL. It’s equivalent to the code above though. Note how there’s nothing to set the continuation – where previously the moveNextDelegate field would be populated within the skeleton method.

Just-in-time delegate creation

Now that the skeleton method doesn’t create the delegate representing the continuation, it can be done when it’s first required – which is when we first encounter an await expression for an awaitable which hasn’t already completed. (If the awaitable has completed before we await it, the generated code skips the continuation and just uses the results immediately and synchronously).

The code for that delegate creation is slightly trickier than you might expect, however. It looks something like this:

Action action = this.moveNextDelegate;
if (action == null)
{
    Task<int> task = this.builder.Task;
    action = new Action(this.MoveNext);
    ((IStateMachine) action.Target).SetMoveNextDelegate(action);
}

There are two oddities here, one of which I mostly understand and one of which I don’t understand at all.

I really don’t understand the "task" variable here. Why do we need to exercise the AsyncTaskMethodBuilder.Task property? We don’t use the result anywhere… does forcing this flush some memory buffer? I have no clue on this one. (See the update at the bottom of the post…)

The part about setting the delegate via the interface makes more sense, but it’s subtle. You might expect code like this:

// Looks sensible, but is actually slightly broken
Action action = this.moveNextDelegate;
if (action == null)
{
    action = new Action(this.MoveNext);
    this.moveNextDelegate = action;
}

That would sort of work – but we’d end up needing to recreate the delegate each time we encountered an appropriate await expression. Although the above code saves the value to the field, it saves it within the current value of the state machine… after we’ve boxed that value as the target of the delegate. The value we want to mutate is the one within the box – which is precisely why there’s an interface, and why the code casts to it.

We can’t even just unbox and then set the field afterwards – at least in C# – because the unbox operation is always followed by a copy operation in normal C#. I believe it would be possible for the C# compiler to generate IL which unboxed action.Target without the copy, and then set the field in that. It’s not clear to me why the team went with the interface approach instead… I would expect that to be slower (as it requires dynamic dispatch) but I could easily be wrong. Of course, it would also make it impossible to decompile the IL to C#, which would make my talks harder, but don’t expect the C# team to bend the compiler implementation for my benefit ;)

(As an aside to all of this, I’ve gone back and forth on whether the "slightly broken" implementation would recreate the delegate on every appropriate await, or only two. I think it would end up being on every occurrence, as even though on the second occurrence we’d be operating within the context of the first boxed instance, the new delegate would have a reference to a new boxed copy each time. It does my head in a little bit, trying to think about this… more evidence that mutable structs are evil and hard to reason about. It’s not the wrong decision in this case, hidden far from the gaze of normal developers, but it’s a pain to reason about.)

Single awaiter variable

In the CTP, each await expression generated a separate field within the state machine, and that field was always of the exact awaiter type. In the VS11 Developer Preview, there’s always exactly one awaiter field (assuming there’s at least one await expression) and it’s always of type object. It’s used like this:

  // Single local variable used by both continuation and first-time paths
  TaskAwaiter<int> localAwaiter;

  …

  if (conditions-for-first-time-execution)
  {
      // Code before await

      localAwaiter = task.GetAwaiter();
      if (localAwaiter.IsCompleted)
      {
          goto Await1Completed;
      }
      this.state = 1;
      TaskAwaiter<int>[] awaiterArray = { localAwaiter };
      this.awaiter = awaiterArray;
      // Lazy delegate creation goes here
      awaiterArray[0].OnCompleted(action);
      return;
  }
  // Continuation would get into here
  localAwaiter = ((TaskAwaiter<int>[]) this.awaiter)[0];
  this.awaiter = null;
  this.state = 0;
Await1Completed:
  int result = localAwaiter.GetResult(); 
  localAwaiter = default(TaskAwaiter<int>);

I realize there’s a lot of code here, but it does make some sense:

  • The value of the awaiter field is always either null, or a reference to a single-element array of the awaiter type for one of the await expressions.
  • A single localAwaiter variable is shared between the two code paths, populated either from the awaitable (on the initial code path) or by copying the value from the array (in the second code path).
  • The field is always set to null and the local variable is set to its default value after use, presumably for the sake of garbage collection

It’s basically a nice way of using the fact that we’ll only ever need one awaiter at a time. It’s not clear to me why an array is used instead of either using a reference to the awaiter for class-based awaiters, or simply by boxing for struct-based awaiters. The latter would need the same "unbox without copy" approach discussed in the previous section – so if there’s some reason why that’s actually infeasible, it would explain the use of an array here. We can’t use the interface trick in this case, as the compiler isn’t in control of the awaiter type (so can’t make it implement an interface).

Expression stack preservation

This one is actually a fix to a bug in the async CTP, which I’ve written about before. We’re used to the stack containing our local variables (in the absence of iterator blocks, captured variables etc, and modulo the stack being an implementation detail) but it’s also used for intermediate results within a single statement. For example, consider this block of code:

int x = 10;
int y = 5;
int z = x + 50 * y;

That last line is effectively:

  • Load the value of x onto the stack
  • Load the value 50 onto the stack
  • Load the value of y onto the stack
  • Multiply the top two stack values (50 and y) leaving the result on the stack
  • Add the top two stack values (x and the previously-computed result) leaving the result on the stack
  • Store the top stack value into z

Now suppose we want to turn y into a Task<int>:

int x = 10;
Task<int> y = Task.FromResult(5);
int z = x + 50 * await y;

Our state machine needs to make sure that it will preserve the same behaviour as the synchronous version, so it needs the same sort of stack. In the new-style state machine, all of that stack is saved in the "stack" field. It’s only one field, but may need to represent multiple different types within the code at various different await expressions – in the code above, for example, it represents two int values. As far as I can discover, the C# compiler generates code which uses the actual type of the value it needs, if it only requires a single value. If it needs multiple values, it uses an appropriate Tuple type, nesting tuples if it goes beyond the number of type parameters supported by the Tuple<…> family of types. So in our case above, we end up with code a bit like this:

// Local variable used in both paths
Tuple<int, int> tuple;

// Code before the await
if (conditions-for-first-time-execution) 

    …
    tuple = new Tuple<int, int>(this.x, 50);
    this.stack = tuple;
    …
}

// Continuation would get into here
tuple = (Tuple<int, int>) this.stack;
// IL copies the values from the tuple onto the stack at this point
this.stack = null;

// Both the fast and slow code paths get here eventually
this.z = stack0 + stack1 * awaiter.GetResult()

I say it’s a bit like this, because it’s hard to represent the IL exactly in C# in this case. The tuple is only created if it’s needed, i.e. not in the already-completed fast path. In that case, the values are loaded onto the stack but not then put into the tuple – execution skips straight to the code which uses the values already on the stack.

When the awaitable isn’t complete immediately, then a Tuple<int, int> is created, stored in the "stack" field, and the continuation is handed to the awaiter. On continuation, the tuple is loaded back from the "stack" field (and cast accordingly), the values are loaded onto the stack – and then we’re back into the common code path of fetching the value and performing the add and multiply operations.

Conclusion

As far as I’m aware, those are the most noticeable changes in the generated code. There may well still be a load more changes in Task<T> and the TPL in general – I wouldn’t be at all surprised – but that’s harder to investigate.

I’m sure all of this has been done in the name of performance (and correctness, in the case of stack preservation). The state machine is now much smaller in terms of the number of fields it requires, and objects are created locally as far as possible (including the state machine itself only requiring heap allocation if there’s ever a "slow" awaitable). I suspect there’s still some room for optimization, however:

  • Both the awaiter and the delegate use careful boxing and either arrays or a mutating interface to allow the boxed value to be changed. I suspect that using unbox with the concrete type, but without copying the value, would be more efficient. I may attempt to work this theory up into a test at some point.
  • If there’s only one awaiter type (usually TaskAwaiter<T> for some T), that type could be used instead of object, potentially reducing heap optimization
  • I’ve no idea why the builder.Task property is explicitly fetched and then the results discarded
  • If there’s only one await expression, the "stack" field can be strongly typed, which would also avoid boxing if only a single value needs to be within that stack
  • The stack field could be removed entirely when it’s not needed for intermediate stack value preservation. (I believe that would be the case reasonably often.)

The use of mutable value types is really fascinating (for me, at least) – I’m sure most people on the C# team would still say they’re evil, but when they’re used in a carefully controlled environment where real developers don’t have to reason about their behaviour, they can be useful.

Next time, I’ll hopefully get back to the idea I promised to write up before, about ordering a collection of tasks in completion order… before they’ve completed. (Well, sort of.) Should be fun…

Update (January 16th 2012)

Stephen Toub got in touch with me after I posted the original version of this blog entry, to explain the use of the Task property. Apparently the idea is that at this point, we know we’re going to return out of the state machine, so the skeleton method is going to access the Task property anyway. However, as we haven’t scheduled the continuation yet we also know that nothing will be accessing the Task property on a different thread. If we access it now for the first time, we can lazily allocate the task in the same thread that created the AsyncMethodBuilder, with no risk of contention. If we can force there to be a task ready and waiting for whatever accesses it later, we don’t need any synchronization in that area.

So why might we want to allocate the task lazily in the first place? Well, don’t forget that we might never have to wait for an await (as it were). We might just have an async method which takes the fast path everywhere. If that’s the case, then for certain cases (e.g. a non-generic, successfully completed task, or a Task<bool> which again has completed successfully) we can reuse the same instance repeatedly. Apparently this laziness isn’t yet part of the VS11 Developer Preview, but the reason for the property access is in preparation for this.

Another case of micro-optimization – which is fair enough when it’s at a system level :)

Awaiting CodeMash 2012

Happy New Year, everyone!

I’m attempting to make 2012 a quiet year in terms of my speaking engagements – I’ve turned down a few kind offers already, and I expect to do so again during the year. I may well still give user group talks in evenings if I can do so without having to take holiday, but full conferences are likely to be out, especially international ones. This is partly so I can take more time off to support my wife, Holly, who has her own books to promote. This year will be particularly important for Holly as she’s one of the World Book Day 2012 authors – I’m tremendously proud of her, as you can no doubt imagine.

However, there’s one international conference I decided to submit proposals for: CodeMash. I’ve never been to this or any other US conference, but I’ve heard fabulous things about it. I’m particularly excited that I’ll be able to present alongside Bill Wagner, a fellow C# author (probably most famous for Effective C# which I’ve reviewed before now). Bill and I have never met, although we’ve participated jointly on a .NET Rocks show before now. I could barely hear Bill when we recorded that though, so it hardly counts :)

The conference schedule for CodeMash shows Bill and I each giving two talks: two individual ones on general C# (C# Stunt Coding by Bill, and C#’s Greatest Mistakes by me) and two sessions on the async support in C# 5… async "from the inside" and "from the outside". Although these have hitherto been shown as separate sessions, everyone involved thought it would make more sense to weave the two together… so this will be a double-length session. Bill will be presenting the "outside" view – how to use async, basically; I’ll be presenting the "inside" view – how it all hangs together behind the scenes.

With any luck, this will be much more helpful to the conference attendees, as they should be able to build up confidence in the solid foundations underpinning it all at the same time as seeing how fabulously useful it’ll be for developers. It also means that Bill and I can bounce ideas off each other spontaneously as we go – I intend to pay close attention and learn a thing or two myself!

It’s pretty much impossible to predict how it’ll all hang together, but I’m really excited about the whole shebang. I’ll be fascinated to see if and how US conferences differ from the various ones this side of the pond… but it does make the whole thing that bit more nerve-wracking. If you’re coming to CodeMash, please grab me and say hi – it never hurts to see a friendly face…

(Note: Bill has a similar blog post posted just before this one.)

Book Review: Fluent C# (Rebecca Riordan, Sams)

(As usual, I will be sending the publisher a copy of this review to give them and the author a chance to reply to it before I publish it to the blog. Other than including their comments and correcting any factual mistakes they may point out, I don’t intend to change the review itself.)

Resources:

Introduction and disclaimers

In late October, Sams (the publisher) approached me to ask if I’d be interested in reviewing their newest introductory book on C#. Despite my burgeoning review stack, I said I was interested – I’m always on the lookout for good books to recommend. So, the first disclaimer is that this was a review copy – I didn’t have to pay for it. I don’t believe that has biased this review though.

Second disclaimer: obviously as C# in Depth is also "a book about C#" you might be wondering whether the two books are competitors. I don’t believe this is the case: Fluent C# explicitly talks about its target audience, which is primarily complete newcomers to programming. C# in Depth pretty much requires you to know at least C# 1, or perhaps be very comfortable with a similar language such as Java. I find it hard to imagine someone for whom both books would be suitable.

Obviously that puts me firmly out of the target audience. As I’ve written before, if you think the two most important questions to answer in a technical book review are "Is it accurate?" and "How good is at teaching its topic?" then any one person will find it hard to answer both questions. Although I’m far from an expert in some of the areas of the book – notably WPF – I’m sure I don’t have the same approach as a true newcomer. In particular, I find myself asking the questions I’d need the answers to in order to develop software professionally: how do I test it? How does the deployment model work? How does the data flow? These aren’t the same concerns as someone who is coming to programming for the first time. This review should be read with that context in mind: that my approach to the subject matter won’t be the same as a regular reader’s.

Physical format and style

Fluent C# is very reminiscent of Head-First C# in its approach, even down to the introductory "why this book is great at teaching you" blurb. It’s all very informal, with lots of pictures, diagrams and reader exercises. It’s a chunky book, at nearly 900 pages including the index – which I’d expect to be pretty daunting to a newcomer. However, that isn’t the main impression you come away with. Instead…

It’s brown. Everywhere. The diagrams, the text, the pictures – they’re all printed in brown, on off-white paper.

Combined with using multiple fonts including cursive ones, this makes for a pointlessly irritating reading experience right from the outset, however good or bad the actual content is. Now it’s possible that this is actually deliberate: I was speaking to someone recently who mentioned some research that shows if you use a hard-to-read font in presentations, people tend to end up reading it several times, so you end up with better memories of the content than if it had been "clean". I don’t know if that’s what Sams intended with this book, but I frequently found myself longing for simple black ink on clean white paper.

Leaving that to one side, I’m not sure I’ll ever really be a fan of the general tone of books like this, but I can certainly see that it’s popular and therefore presumably helpful to many people. It’s not clear to me whether it’s possible to create a book which retains the valuable elements of this style while casting off the aspects which rub me up the wrong way. It’s something about the enforced jollity which just doesn’t quite sit right, but it wouldn’t surprise me if that were more a peculiarity of my personality than anything about the book. Again, I’ve tried to set this to one side when reviewing the book, but it may come through nonetheless.

Structure

The book is broken up into the following sections, with several chapters per section:

  • Getting started (122 pages – finding your way around Visual Studio, debugging, deployment)
  • The Language (100 pages – introduction to C#)
  • The .NET Framework Library (162 pages – text, date/time APIs, collections – and actually more about C# as a language)
  • Best practice (116 pages – inheritance, some principles, design patterns)
  • WPF (341 pages)

I’ve included the page count for each section to show just how much is devoted to WPF. The book goes into much more detail about WPF than it does about the C# language itself (for example, drop shadow effects are included, but the "using" statement and nullable value types aren’t). If you want to write any kind of application other than a WPF one, a large part of the book won’t be useful to you. That’s not to say it’s useless per se – and in fact from my point of view, the WPF section was the most useful. The section on brushes is probably the best written in the whole book, for example. At time it feels to me like the author really wanted to write a book about WPF, but was asked to make it one about C# instead. That may well not be the case at all – it was just an impression.

Even though the best practice section talks briefly about MVC, MVP and MVVM, it doesn’t really go into enough detail to build anything like a real application – and in fact there’s no coverage of persistence of any form. No files, no XML, no database – nothing below the presentation layer, really. As such, although the book claims it’s enough to get you started with application development, it actually only provides a veneer. Even though I didn’t like the first edition of Head-First C# back in 2008, it did at least take the reader end-to-end – the exercises led to complete applications. The best practice section isn’t entirely about architecture and design patterns, however – it’s at this point that inheritance is properly introduced. While I wouldn’t personally count that as a "best practice" as such, it does at least come at the start of the section, before the genuine patterns/architecture areas which would have been harder to understand without that background.

One aspect which concerned me was the emphasis on the debugger and interactive diagnostics. The author states that developers should expect to spend a large part of their time in the debugger, and she says how she prefers using MessageBox.Show for diagnostics over Console.WriteLine information appearing in the output window. While I’m all for something more sophisticated than Console.WriteLine, there are solutions which are a lot less invasive than popping up a dialog, and which can be left in the code (possibly under an execution-time configuration) to allow diagnostics to be produced for real systems.

The "testing and deployment" chapter says nothing about automated tests – it’s as if the author believes that "testing" only involves "running the app in the debugger and seeing if it breaks". I hope that’s not actually the case, and I can understand why newcomers ought to at least know about the debugger – but I’d have welcomed at least a few pages introducing unit testing as a way of recording and checking expectations of how your code behaves. My own preference is to spend as little time in the debugger as possible; I know that’s not always practical, particularly for UI work, but I think it’s a reasonable aim.

Accuracy

Anyone following me on Twitter or Google+ knows where I’m going with this section. After reading through the book, pen in hand (as I always do, even for the books I like), I decided that it was more important to get out some form of errata quickly than this review. As such, I started a Google document which is publicly available to read and add comments to. The result is over 60 pages of notes and errata, and that’s excluding the introduction and table of contents. To be fair to the book, some of those notes are matters of disagreement which are more personal opinion than incontrovertible fact – but there are plenty of simple matters of inaccuracy. Some of the worst are:

  • Claims that String is a value type. (It’s a reference type.)
  • Inconsistency between whether arrays are value types or reference types – but consistently claiming that arrays are immutable, with the exception of the size which can be changed (slowly) using Array.Resize. (Array types are always reference types, and they’re always mutable except the size, which is fixed after creation. Array.Resize creates a new array, it doesn’t change the size of the existing one.)
  • Incorrect syntax for chaining from one constructor to another.
  • The claim that all reference types are mutable. (Some aren’t, and indeed I often aim for immutability. The canonical example of an immutable reference type is String.)

There are plenty more – including huge number of samples which simply won’t compile. Whole double page spreads where every class declaration is missing the "class" keyword. Pieces of code using VB syntax… the list goes on. (The VB syntax errors are probably explained by the author’s other book published at the same time: "Fluent Visual Basic". I suspect there was a certain amount of copy/paste, and the editing process didn’t catch all the changes which were needed to reflect the differences between the languages.)

Beyond the factually incorrect statements, there’s the matter of terminology. Now I’m well aware that I care more about terminology than more people – but there’s simply no reason to start making up terminology or misusing the perfectly good terminology from the specification. The book has a whole section on "commands" in C#, including things like for statements, switch statements, try/catch/finally statements. Additionally, it mislabels class and namespace declarations as "statements", and even mislabels using directives as statements – although it later goes back on the latter point. The word "object" is used at various times to mean any of variable, type, class and object, with no sense of consistency that I could fathom. For example, at one point it’s used in two different senses within the same sentence: "As we’ll see, you can define several different kinds of objects (called TYPES) in C#, but the one you’ll probably work with most often is the OBJECT."

Both accuracy and staying consistent with accepted terminology (primarily the specification) are particularly important for newcomers. If there’s a typo in a relatively advanced book – or in one which is about a particular technology (e.g. MVC) rather than an introductory text on a language, the reader is fairly likely to be able to guess what should really be there based on their existing experience. If a beginner comes across the same problem, they’re likely to assume it’s their fault that the code won’t compile. Likewise if they learn the wrong terminology to start with, they’ll be severely hampered in communicating effectively with other developers – as well as when reading other books.

I don’t want to make it sound like I expect perfection in a book – just yesterday someone mailed me a correction to C# in Depth, and I’d be foolish to try to hold other authors to standards I couldn’t meet myself. Nor am I suggesting it’s easy to be both accessible and accurate – so often an author may have an accurate picture of a complex topic, but have to simplify it in their writing, particularly for an introductory book like Fluent C#. But there are limits – and in my view this book goes well past the level of error that I’m willing to put up with.

Conclusion

I really don’t like ranting. I don’t like sounding mean – and I wanted to like this book. While I like C# 4.0 in a Nutshell and Essential C# 4.0, I’m still looking for a book which I can recommend to readers who want a more "lively" kind of book. Unfortunately I really can’t recommend Fluent C# to anyone – it is simply too inaccurate, and I believe it will cause confusion and instil bad habits in its readers.

So, what next? I’m hoping that the publisher and author will take my errata on board for the next printing, and revise it thoroughly. At that point I still don’t think I’d actually like the book due to its structure and WPF focus (and the colour scheme, which I don’t expect to change), but it would at least be more a matter of taste then.

I have some reason to be hopeful – because my review of Head-First C# was somewhat like this one, and one of the authors of that book (Andrew Stellman) was incredibly good about the whole thing, and as a result the second edition of Head-First C# is a much better book than the first edition. Again, it’s not quite my preferred style, but for readers who like that sort of thing, it’s a much better option than Fluent C# at the moment, and one I’m happy to recommend (with the express caveat of getting the second edition).

At the same time, reading Fluent C# (and particularly thinking about its debugger-first approach) has set me something of a challenge. You see, I’ve mostly avoided writing for new programmers so far – but I feel it’s really important to get folks off on the right foot, and I’d like to have a stab at it. In particular, I would like to see if it’s possible to write an introductory text which teaches C# using unit tests wherever possible… but without being dry. Can we have a "fun" but accurate book, which tries to teach C# from scratch without giving the impression that user interfaces are the be-all and end-all of programming? Can I write in a way which is more personal but doesn’t feel artificial? I can’t see myself starting such a project any time in the next year, but maybe some time in 2013… Watch this space. In the meantime, I’ll keep an eye out for any more introductory books which might be more promising than Fluent C#.

Eduasync part 17: unit testing

In the last post I showed a method to implement "majority voting" for tasks, allowing a result to become available as soon as possible. At the end, I mentioned that I was reasonably confident that it worked because of the unit tests… but I didn’t show the tests themselves. I felt they deserved their own post, as there’s a bigger point here: it’s possible to unit test async code. At least sometimes.

Testing code involving asynchrony is generally a pain. Introducing the exact order of events that you want is awkward, as is managing the threading within tests. With a few benefits with async methods:

  • We know that the async method itself will only execute in a single thread at a time
  • We can control the thread in which the async method will execute, if it doesn’t configure its awaits explicitly
  • Assuming the async method returns Task or Task<T>, we can check whether or not it’s finished
  • Between Task<T> and TaskCompletionSource<T>, we have a way of injecting tasks that we understand

Now in our sample method we have the benefit of passing in the tasks that will be awaited – but assuming you’re using some reasonably testable API to fetch any awaitables within your async method, you should be okay. (Admittedly in the current .NET framework that excludes rather a lot of classes… but the synchronous versions of those calls are also generally hard to test too.)

The plan

For our majority tests, we want to be able to see what happens in various scenarios, with tasks completing at different times and in different ways. Looking at the test cases I’ve implemented I have the following tests:

  • NullSequenceOfTasks
  • EmptySequenceOfTasks
  • NullReferencesWithinSequence
  • SimpleSuccess
  • InputOrderIsIrrelevant
  • MajorityWithSomeDisagreement
  • MajorityWithFailureTask
  • EarlyFailure
  • NoMajority

I’m not going to claim this is a comprehensive set of possible tests – it’s a proof of concept more than anything else. Let’s take one test as an example: MajorityWithFailureTask. The aim of this is to pass three tasks (of type Task<string>) into the method. One will give a result of "x", the second will fail with an exception, and the third will also give a result of "x". The events will occur in that order, and only when all three results are in should the returned task complete, at which point it will also have a success result of "x".

So, the tricky bit (compared with normal testing) is introducing the timing. We want to make it appear as if tasks are completing in a particular order, at predetermined times, so we can check the state of the result between events.

Introducing the TimeMachine class

Okay, so it’s a silly name. But the basic idea is to have something to control the logical flow of time through our test. We’re going to ask the TimeMachine to provide us with tasks which will act in a particular way at a given time, and then when we’ve started our async method we can then ask it to move time forward, letting the tasks complete as they go. It’s probably best to look at the code for MajorityWithFailureTask first, and then see what the implementation of TimeMachine looks like. Here’s the test:

[Test]
public void MajorityWithFailureTask()
{
    var timeMachine = new TimeMachine();
    // Second task gives a different result
    var task1 = timeMachine.AddSuccessTask(1, "x");
    var task2 = timeMachine.AddFaultingTask<string>(2, new Exception("Bang!"));
    var task3 = timeMachine.AddSuccessTask(3, "x");

    var resultTask = MoreTaskEx.WhenMajority(task1, task2, task3);
    Assert.IsFalse(resultTask.IsCompleted);

    // Only one result so far – no consensus
    timeMachine.AdvanceTo(1);
    Assert.IsFalse(resultTask.IsCompleted);

    // Second result is a failure
    timeMachine.AdvanceTo(2);
    Assert.IsFalse(resultTask.IsCompleted);

    // Third result gives majority verdict
    timeMachine.AdvanceTo(3);
    Assert.AreEqual(TaskStatus.RanToCompletion, resultTask.Status);
    Assert.AreEqual("x", resultTask.Result);
}

As you can see, there are two types of method:

  • AddSuccessTask / AddFaultingTask / AddCancelTask (not used here) – these all take the time at which they’re going to complete as their first parameter, and the method name describes the state they’ll reach on completion. The methods return the task created by the time machine, ready to pass into the production code we’re testing.
  • AdvanceTo / AdvanceBy (not used here) – make the time machine "advance time", completing pre-programmed tasks as it goes. When those tasks complete, any continuations attached to them also execute, which is how the whole thing hangs together.

Now forcing tasks to complete is actually pretty simple, if you build them out of TaskCompletionSource<T> to start with. So all we need to do is keep our tasks in "time" order (which I achieve with SortedList), and then when we’re asked to advance time we move through the list and take the appropriate action for all the tasks which weren’t completed before, but are now. I represent the "appropriate action" as a simple Action, which is built with a lambda expression from each of the Add methods. It’s really simple:

public class TimeMachine
{
    private int currentTime = 0;
    private readonly SortedList<int, Action> actions = new SortedList<int, Action>();

    public int CurrentTime { get { return currentTime; } }

    public void AdvanceBy(int time)
    {
        AdvanceTo(currentTime + time);
    }

    public void AdvanceTo(int time)
    {
        // Okay, not terribly efficient, but it’s simple.
        foreach (var entry in actions)
        {
            if (entry.Key > currentTime && entry.Key <= time)
            {
                entry.Value();
            }
        }
        currentTime = time;
    }

    public Task<T> AddSuccessTask<T>(int time, T result)
    {
        TaskCompletionSource<T> tcs = new TaskCompletionSource<T>();
        actions[time] = () => tcs.SetResult(result);
        return tcs.Task;
    }

    public Task<T> AddCancelTask<T>(int time)
    {
        TaskCompletionSource<T> tcs = new TaskCompletionSource<T>();
        actions[time] = () => tcs.SetCanceled();
        return tcs.Task;
    }

    public Task<T> AddFaultingTask<T>(int time, Exception e)
    {
        TaskCompletionSource<T> tcs = new TaskCompletionSource<T>();
        actions[time] = () => tcs.SetException(e);
        return tcs.Task;
    }
}

Okay, that’s a fair amount of code for a blog posts (and yes, it could do with some doc comments etc!) but considering that it makes life testable, it’s pretty simple.

So, is that it?

It works on my machine… with my test runner… in simple cases…

When I first ran the tests using TimeMachine, they worked almost immediately. This didn’t surprise me nearly as much as it should have done. You see, when the tests execute, they use async/await in the normal way – which means the continuations are scheduled on "the current task scheduler". I have no idea what the current task scheduler is in unit tests. Or rather, it feels like something which is implementation specific. It could easily have worked when running the tests from ReSharper, but not from NCrunch, or not from the command line NUnit test runner.

As it happens, I believe all of these run tests on thread pool threads with no task scheduler allocated, which means that the continuation is attached to the task to complete "in-line" – so when the TimeMachine sets the result on a TaskCompletionSource, the continuations execute before that call returns. That means everything happens on one thread, with no ambiguity or flakiness – yay!

However, there are two problems:

  • The words "I believe" aren’t exactly confidence-inspiring when it comes to testing that your software works correctly.
  • Our majority voting code only ever sees one completed task at a time – we’re not testing the situation where several tasks complete so quickly together that the continuation doesn’t get chance to run before they’ve all finished.

Both of these are solvable with a custom TaskScheduler or SynchronizationContext. Without diving into the docs, I’m not sure yet which I’ll need, but the aim will be:

  • Make TimeMachine implement IDisposable
  • In the constructor, set the current SynchronizationContext (or TaskScheduler) to a custom one having remembered what the previous one was
  • On disposal, reset the context
  • Make the custom scheduler keep a queue of jobs, such that when we’re asked to advance to time T, we complete all the appropriate tasks but don’t execute any continuations, then we execute all the pending continuations.

I don’t yet know how hard it will be, but hopefully the Parallel Extensions Samples will help me.

Conclusion

I’m not going to claim this is "the" way of unit testing asynchronous methods. It’s clearly a proof-of-concept implementation of what can only be called a "test framework" in the loosest possible sense. However, I hope it gives an example of a path we might take. I’m looking forward to seeing what others come up with, along with rather more polished implementations.

Next time, I’m going to shamelessly steal an idea that a reader mailed me (with permission, of course). It’s insanely cool, simple and yet slightly brain-bending, and I suspect will handy in many situations. Love it.

Eduasync part 16: Example of composition: majority voting

Note: For the rest of this series, I’ll be veering away from the original purpose of the project (investigating what the compiler is up to) in favour of discussing the feature itself. As such, I’ve added a requirement for AsyncCtpLib.dll – but due to potential distribution restrictions, I’ve felt it safest not to include that in the source repository. If you’re running this code yourself, you’ll need to copy the DLL from your installation location into the Eduasynclib directory before it will build – or change each reference to it.

One of the things I love about async is the compositional aspect. This is partly due to the way that the Task Parallel Library encourages composition to start with, but async/await makes it even easier by building the tasks for you. In the next few posts I’ll talk about a few examples of interesting building blocks. I wouldn’t be surprised to see an open source library with a proper implementation of some of these ideas (Eduasync is not designed for production usage) whether from Microsoft or a third party.

In project 26 of Eduasync, I’ve implemented "majority voting" via composition. The basic idea is simple, and the motivation should be reasonably obvious in this day and age of redundant services. You have (say) five different tasks which are meant to be computing the same thing. As soon as you have a single answer which the majority of the tasks agree on, the code which needs the result can continue. If the tasks disagree, or fail (or a combination leading to no single successful majority result), the overall result is failure too.

My personal experience with services requiring a majority of operations to return is with Megastore, a storage system we use at Google. I’m not going to pretend to understand half of the details of how Megastore works, and I’m certainly not about to reveal any confidential information about its internals or indeed how we use it, but basically when discussing it with colleagues at around the time that async was announced, I contemplated what a handy feature async would be when implementing a Megastore client. It could also be used in systems where each calculation is performed in triplicate to guard against rogue errors – although I suspect the chances of those systems being implemented in C# are pretty small.

It’s worth mentioning that the implementation here wouldn’t be appropriate for something like a stock price service, where the result can change rapidly and you may be happy to tolerate a small discrepancy, within some bounds.

The API

Here’s the signatures of the methods we’ll implement:

public static Task<T> WhenMajority<T>(params Task<T>[] tasks)

public static Task<T> WhenMajority<T>(IEnumerable<Task<T>> tasks)

Obviously the first just delegates to the second, but it’s helpful to have both forms, so that we can pass in a few tasks in an ad hoc manner with the first overload, or a LINQ-generated sequence of tasks with the second.

The name is a little odd – it’s meant to match WhenAll and WhenAny, but I’m sure there are better options. I’m not terribly interested in that at the moment.

It’s easy to use within an async method:

Task<int> firstTask = firstServer.ComputeSomethingAsync(input);
Task<int> secondTask = selectServer.ComputeSomethingAsync(input);
Task<int> thirdTask = thirdServer.ComputeSomethingAsync(input);

int result = await MoreTaskEx.WhenMajority(firstTask, secondTask, thirdTask);

Or using the LINQ-oriented overload:

var tasks = servers.Select(server => server.ComputeSomethingAsync(input));
int result = await MoreTaskEx.WhenMajority(tasks);

Of course we could add an extension method (dropping the When prefix as it doesn’t make as much sense there, IMO):

int result = await servers.Select(server => server.ComputeSomethingAsync(input))
                          .MajorityAsync();

The fact that we’ve stayed within the Task<T> model is what makes it all work so smoothly. We couldn’t easily express the same API for other awaitable types in general although we could do it for any other specific awaitable type of course. It’s possible that it would work using dynamic, but I’d rather avoid that :) Let’s implement it now.

Implementation

There are two parts to the implementation, in the same way that we implemented LINQ operators in Edulinq – and for the same reason. We want to go bang immediately if there are any clear input violations – such as the sequence of tasks being null or empty. This is in line with the Task-based Asynchronous Pattern white paper:

An asynchronous method should only directly raise an exception to be thrown out of the MethodNameAsync call in response to a usage error*. For all other errors, exceptions occurring during the execution of an asynchronous method should be assigned to the returned Task.

Now it occurs to me that we don’t really need to do this in two separate methods (one for precondition checking, one for real work). We could create an async lambda expression of type Func<Task<T>>, and make the method just return the result of invoking it – but I don’t think that would be great in terms of readability.

So, the first part of the implementation performing validation is really simple:

public static Task<T> WhenMajority<T>(params Task<T>[] tasks)
{
    return WhenMajority((IEnumerable<Task<T>>) tasks);
}

public static Task<T> WhenMajority<T>(IEnumerable<Task<T>> tasks)
{
    if (tasks == null)
    {
        throw new ArgumentNullException("tasks");
    }
    List<Task<T>> taskList = new List<Task<T>>(tasks);
    if (taskList.Count == 0)
    {
        throw new ArgumentException("Empty sequence of tasks");
    }
    foreach (var task in taskList)
    {
        if (task == null)
        {
            throw new ArgumentException("Null task in sequence");
        }
    }
    return WhenMajorityImpl(taskList);
}

The interesting part is obviously in WhenMajorityImpl. It’s mildly interesting to note that I create a copy of the sequence passed in to start with – I know I’ll need it in a fairly concrete form, so it’s appropriate to remove any laziness at this point.

So, here’s WhenMajorityImpl, which I’ll then explain:

private static async Task<T> WhenMajorityImpl<T>(List<Task<T>> tasks)
{
    // Need a real majority – so for 4 or 5 tasks, must have 3 equal results.
    int majority = (tasks.Count / 2) + 1;
    int failures = 0;
    int bestCount = 0;
            
    Dictionary<T, int> results = new Dictionary<T, int>();
    List<Exception> exceptions = new List<Exception>();
    while (true)
    {
        await TaskEx.WhenAny(tasks);
        var newTasks = new List<Task<T>>();
        foreach (var task in tasks)
        {
            switch (task.Status)
            {
                case TaskStatus.Canceled:
                    failures++;
                    break;
                case TaskStatus.Faulted:
                    failures++;
                    exceptions.Add(task.Exception.Flatten());
                    break;
                case TaskStatus.RanToCompletion:
                    int count;
                    // Doesn’t matter whether it was there before or not – we want 0 if not anyway
                    results.TryGetValue(task.Result, out count);
                    count++;
                    if (count > bestCount)
                    {
                        bestCount = count;
                        if (count >= majority)
                        {
                            return task.Result;
                        }
                    }
                    results[task.Result] = count;
                    break;
                default:
                    // Keep going next time. may not be appropriate for Created
                    newTasks.Add(task);
                    break;
            }
        }
        // The new list of tasks to wait for
        tasks = newTasks;

        // If we can’t possibly work, bail out.
        if (tasks.Count + bestCount < majority)
        {
            throw new AggregateException("No majority result possible", exceptions);
        }
    }
}

I should warn you that this isn’t a particularly efficient implementation – it was just one I wrote until it worked. The basic steps are:

  • Work out how many results make a majority, so we know when to stop
  • Keep track of how many "votes" our most commonly-returned result has, along with the counts of all the votes
  • Repeatedly:
    • Wait (asynchronously) for at least of the remaining tasks to finish (many may finish "at the same time")
    • Start a new list of "tasks we’re going to wait for next time"
    • Process each task in the current list, taking an action on each state:
      • If it’s been cancelled, we’ll treat that as a failure (we could potentially treat "the majority have been cancelled" as a cancellation, but for the moment a failure is good enough)
      • If it’s faulted, we’ll add the exception to the list of exceptions, so that if the overall result ends up as failure, we can throw an AggregateException with all of the individual exceptions
      • If it’s finished successfully, we’ll check the result:
        • Add 1 to the count for that result (the dictionary will use the default comparer for the result type, which we assume is good enough)
        • If this is greater than the previous "winner" (which could be for the same result), check for it being actually an overall majority, and return if so.
      • If it’s still running (or starting), add it to the new task list
    • Check whether enough tasks have failed – or given different results – so ensure that a majority is now impossible. If so, throw an AggregateException to say so. This may have some exceptions, but it may not (if there are three tasks which gave different results, none of them actually failed)

Each iteration of the "repeatedly" will have a smaller list to check than before, so we’ll definitely terminate at some point.

I mentioned that it’s inefficient. In particular, we’re ignoring the fact that WhenAny returns a Task<Task<T>>, so awaiting that will actually tell us a task which has finished. We don’t need to loop over the whole collection at that point – we could just remove that single task from the collection. We could do that efficiently if we kept a Dictionary<Task<T>, LinkedListNode<Task<T>> and a LinkedList<Task<T>> – we’d just look up the task which had completed in the dictionary, remove its node from the list, and remove the entry from the dictionary. We wouldn’t need to create a new collection each time, or iterate through all of the old one. However, that’s a job for another day… as is allowing a cancellation token to be passed in, and a custom equality comparer.

Conclusion

So we can make this implementation smarter and more flexible, certainly – but it’s not insanely tricky to write. I’m reasonably confident that it works, too – as I have unit tests for it. They’ll come in the next part. The important point  from this post is that by sticking within the Task<T> world, we can reasonably easily create building blocks to allow for composition of asynchronous operations. While it would be nice to have someone more competent than myself write a bullet-proof, efficient implementation of this operation, I wouldn’t feel too unhappy using a homegrown one in production. The same could not have been said pre-async/await. I just wouldn’t have had a chance of getting it right.

Next up – the unit tests for this code, in which I introduce the TimeMachine class.

Upcoming speaking engagements

It’s just occurred to me that I’ve forgotten to mention a few of the things I’ll be up to in the near-ish future. (I’ve talked about next week’s Progressive .NET session before.) This is just a quick rundown – follow the links for more blurb and details.

.NET Developer Network – Bristol, September 21st (evening)

I’ll be talking about async in Bristol – possibly at a high level, possibly in detail, depending on the audience experience. This is my first time talking with this particular user group, although I’m sure there’ll be some familiar faces. Come along if you’re in the area.

Øredev 2011 – Malmö, November 9th

It’s a whistle-stop trip to Sweden as I’m running out of vacation days; I’m flying out on the Tuesday evening and back on the Wednesday evening, but while I’m there I’ll give two talks:

  • Async 101 (yes, more async; I wonder at what point I’ll have given as many talks about it as Mads)
  • Effective technical communication (not a particularly technical talk, but definitely specific to technical communication)

Last year I had an absolute blast – looking forward to this year, even though I won’t have as much time for socializing.

Stack Overflow Dev Days 2011 – London, November 14th – cancelled!

Update: Dev Days has been cancelled. I’m still hoping to do something around this topic, and there may be small-scale meet-ups in London anyway.

Two years ago I talked about how humanity had let the world of software engineering down. This was one of the best talks I’ve ever given, and introduced the world to Tony the Pony. Unfortunately that puts the bar relatively high for this year’s talk – at least, high by my own pretty low standards.

In a somewhat odd topic for a Christian and a happy employee of a company with a code of conduct which starts "Don’t be evil," this year’s talk is entitled "Thinking in evil." As regular readers are no doubt aware, I love torturing the C# language and forcing the compiler to work with code which would make any right-thinking software engineer cringe. I was particularly gratified recently when Eric Lippert commented on one of my Stack Overflow answers that this was "the best abuse of C# I’ve seen in a while." I’m looking forward to talking about why I think it’s genuinely a good idea to think about nasty code like this – not to use it, but to get to know your language of choice more intimately. Like last time, I have little idea of exactly what this talk will be like, but I’m really looking forward to it.

Optimization and generics, part 2: lambda expressions and reference types

As with almost any performance work, your mileage may vary (in particular the 64-bit JIT may work differently) and you almost certainly shouldn’t care. Relatively few people write production code which is worth micro-optimizing. Please don’t take this post as an invitation to make code more complicated for the sake of irrelevant and possibly mythical performance changes.

It took me a surprisingly long time to find the problem described in the previous blog post, and almost no time at all to fix it. I understood why it was happening. This next problem took a while to identify at all, but even when I’d found a workaround I had no idea why it worked. Furthermore, I couldn’t reproduce it in a test case… because I was looking for the wrong set of triggers. I’ve now found at least some of the problem though.

This time the situation in Noda Time is harder to describe, although it concerns the same area of code. In various places I need to create new delegates containing parsing steps and add them to the list of steps required for a full parse. I can always use lambda expressions, but in many cases I’ve got the same logic repeatedly… so I decided to pull it out into a method. Bang – suddenly the code runs far slower. (In reality, I’d performed this refactoring first, and "unrefactored" it to speed things up.)

I think the problem comes down to method group conversions with generic methods and a type argument which is a reference type. The CLR isn’t very good at them, and the C# compiler uses them more than it needs to.

Show me the benchmark!

The complete benchmark code is available of course, but fundamentally I’m doing the same thing in each test case: creating a delegate of type Action which does nothing, and then checking that the delegate reference is non-null (just to avoid the JIT optimizing it away). In each case this is done in a generic method with a single type parameter. I call each method in two ways: once with int as the type argument, and once with string as the type argument. Here are the different cases involved:

  • Use a lambda expression: Action foo = () => {};
  • Fake what I expected the compiler to do: keep a separate generic cache class with a static variable for the delegate; populate the cache once if necessary, and thereafter use the cache field
  • Fake what the compiler is actually doing with the lambda expression: write a separate generic method and perform a method group conversion to it
  • Do what the compiler could do: write a separate non-generic method and perform a method group conversion to it
  • Use a method group conversion to a static (non-generic) method on a generic type
  • Use a method group conversion to an instance (non-generic) method on a generic type, via a generic cache class with a single field in referring to an instance of the generic class

(Yes, the last one is a bit convoluted – but the line in the method itself is simple: Action foo = ClassHolder<T>.SampleInstance.NoOpInstance;

Remember, we’re doing each of these in a generic method, and calling that generic method using a type argument of either int or string. (I’ve run a few tests, and the exact type isn’t important – all that matters is that int is a value type, and string is a reference type.)

Importantly, we’re not capturing any variables, and the type parameter is not involved in either the delegate type or any part of the implementation body.

Benchmark results

Again, times are in milliseconds – but this time I didn’t want to run it for 100 million iterations, as the "slow" versions would have taken far too long. I’ve run this on the x64 JIT as well and seen the same effect, but I haven’t included the figures here.

Times in milliseconds for 10 million iterations

Test TestCase<int> TestCase<string>
Lambda expression 180 29684
Generic cache class 90 288
Generic method group conversion 184 30017
Non-generic method group conversion 178 189
Static method on generic type 180 29276
Instance method on generic type 202 299

Yes, it’s about 150 times slower to create a delegate from a generic method with a reference type as the type argument than with a value type… and yet this is the first I’ve heard of this. (I wouldn’t be surprised if there were a post from the CLR team about it somewhere, but I don’t think it’s common knowledge by any means.)

Conclusion

One of the tricky things is that it’s hard to know exactly what the C# compiler is going to do with any given lambda expression. In fact, the method which was causing me grief earlier on isn’t generic, but it’s in a generic type and captures some variables which use the type parameters – so perhaps that’s causing a generic method group conversion somewhere along the way.

Noda Time is a relatively extreme case, but if you’re using delegates in any performance-critical spots, you should really be aware of this issue. I’m going to ping Microsoft (first informally, and then via a Connect report if that would be deemed useful) to see if there’s an awareness of this internally as potential "gotcha", and whether there’s anything that can be done. Normal trade-offs of work required vs benefit apply, of course. It’s possible that this really is an edge case… but with lambdas flying everywhere these days, I’m not sure that it is.

Maybe tomorrow I’ll actually be able to finish getting Noda Time moved onto the new system… all of this performance work has been a fun if surprising distraction from the main job of shipping working code…

Optimization and generics, part 1: the new() constraint (updated: now with CLR v2 results)

As with almost any performance work, your mileage may vary (in particular the 64-bit JIT may work differently) and you almost certainly shouldn’t care. Relatively few people write production code which is worth micro-optimizing. Please don’t take this post as an invitation to make code more complicated for the sake of irrelevant and possibly mythical performance changes.

I’ve been doing quite a bit of work on Noda Time recently – and have started getting my head round all the work that James Keesey has put into the parsing/formatting. I’ve been reworking it so that we can do everything without throwing any exceptions, and also to work on the idea of parsing a pattern once and building a sequence of actions for both formatting and parsing from the action. To format or parse a value, we then just need to apply the actions in turn. Simples.

Given that this is all in the name of performance (and I consider Noda Time to be worth optimizing pretty hard) I was pretty cross when I ran a complete revamp through the little benchmarking tool we use, and found that my rework had made everything much slower. Even parsing a value after parsing the pattern was slower than parsing both the value and the pattern together. Something was clearly very wrong.

In fact, it turns out that at least two things were very wrong. The first (the subject of this post) was easy to fix and actually made the code a little more flexible. The second (the subject of the next post, which may be tomorrow) is going to be harder to work around.

The new() constraint

In my SteppedPattern type, I have a generic type parameter – TBucket. It’s already constrained in terms of another type parameter, but that’s irrelevant as far as I’m aware. (After today though, I’m taking very little for granted…) The important thing is that before I try to parse a value, I want to create a new bucket. The idea is that bits of information end up in the bucket as they’re being parsed, and at the very end we put everything together. So each parse operation requires a new bucket. How can we create one in a nice generic way?

Well, we can just call its public parameterless constructor. I don’t mind the types involved having such a constructor, so all we need to do is add the new() constraint, and then we can call new TBucket():

// Somewhat simplified…
internal sealed class SteppedPattern<TBucket> : IParsePattern<TBucket>
    where TBucket : new()
{
    public ParseResult Parse(string value)
    {
        TBucket bucket = new TBucket();

        // Rest of parsing goes here
    }
}

Great! Nice and simple. Unfortunately, it turned out that that one line of code was taking 75% of the time to parse a value. Just creating an empty bucket – pretty much the simplest bit of parsing. I was amazed when I discovered that.

Fixing it with a provider

The fix is reasonably easy. We just need to tell the type how to create an instance, and we can do that with a delegate:

// Somewhat simplified…
internal sealed class SteppedPattern<TBucket> : IParsePattern<TBucket>
{
    private readonly Func bucketProvider;

    internal SteppedPattern(Func bucketProvider)
    {
        this.bucketProvider = bucketProvider;
    }

    public ParseResult Parse(string value)
    {
        TBucket bucket = bucketProvider();

        // Rest of parsing goes here
    }
}

Now I can just call new SteppedPattern(() => new OffsetBucket()) or whatever. This also means I can keep the constructor internal, not that I care much. I could even reuse old parse buckets if that wouldn’t be a semantic problem – in other cases it could be useful. Hooray for lambda expressions – until we get to the next post, anyway.

Show me the figures!

You don’t want to have to run Noda Time’s benchmarks to see the results for yourself, so I wrote a small benchmark to time just the construction of a generic type. As a measure of how insignificant this would be for most apps, these figures are in milliseconds, performing 100 million iterations of the action in question. Unless you’re going to do this in performance-critical code, you just shouldn’t care.

Anyway, the benchmark has four custom types: two classes, and two structs – a small and a large version of each. The small version has a single int field; the large version has eight long fields. For each type, I benchmarked both approaches to initialization.

The results on two machines (32-bit and 64-bit) are below, for both the v2 CLR and v4. The 64-bit machine is much faster in general – you should only compare results within one machine, as it were.)

CLR v4: 32-bit results (ms per 100 million iterations)

Test type new() constraint Provider delegate
Small struct 689 1225
Large struct 11188 7273
Small class 16307 1690
Large class 17471 3017

CLR v4: 64-bit results (ms per 100 million iterations)

Test type new() constraint Provider delegate
Small struct 473 868
Large struct 2670 2396
Small class 8366 1189
Large class 8805 1529

CLR v2: 32-bit results (ms per 100 million iterations)

Test type new() constraint Provider delegate
Small struct 703 1246
Large struct 11411 7392
Small class 143967 1791
Large class 143107 2581

CLR v2: 64-bit results (ms per 100 million iterations)

Test type new() constraint Provider delegate
Small struct 510 686
Large struct 2334 1731
Small class 81801 1539
Large class 83293 1896

(An earlier version of this post had a mistake – my original tests used structs for everything, despite the names.)

Others have reported slightly different results, including the new() constraint being better for both large and small structs.

Just in case you hadn’t spotted them, look at the results for classes. Those are the real results – it took over 2 minutes to run the test using the new() constraint on my 32-bit laptop, compared with under two seconds for the provider. Yikes. This was actually the situation I was in for Noda Time, which is built on .NET 2.0 – it’s not surprising that so much of my benchmark’s time was spent constructing classes, given results like this.

Of course you can download the benchmark program for yourself and see how it performs on your machine. It’s a pretty cheap-and-cheerful benchmark, but when the differences are this big, minor sources of inaccuracy don’t bother me too much. The simplest way to run under CLR v2 is to compile with the .NET 3.5 C# compiler to start with.

What’s going on under the hood?

As far as I’m aware, there’s no IL to give support for the new() constraint, in terms of using the parameterless constructor. (The constraint itself can be expressed in IL though.) Instead, when we write new T() in C#, the compiler emits a call to Activator.CreateInstance. Apparently, that’s slower than calling a delegate – presumably due to trying to find an accessible constructor with reflection, and invoking it. I suspect it could be optimized relatively easily – e.g. by caching the results per type it’s called with, in terms of delegates. I’m slightly surprised this hasn’t (apparently) been optimized, given how easy it is to cache values by generic type. No doubt there’s a good reason lurking there somewhere, even if it’s only the memory taken up by the cache.

Either way, it’s easy to work around in general.

Conclusion

I wouldn’t have found this gotcha if I didn’t have before and after tests (or in this case, side-by-side tests of the old way and the new way of parsing). The real lesson of this post shouldn’t be about the new() constraint – it should be how important it is to test performance (assuming you care), and how easy it is to assume certain operations are cheap.

Next post: something much weirder.

Speaking engagement: Progressive .NET, London, September 7th

Just a quick note to mention an event I’ll be speaking at in September. SkillsMatter will be hosting Progressive .NET, a 3-day event set of tutorials on September 5th-7th in London. I’ll be speaking about C# 5’s async feature on the last day (9.30am-1pm) but there’s a host of other speakers too. Should be good. For my own part, with four hours or so to cover async, I should be able to cover both the high level stuff and the implementation details, with plenty of time for the inevitable questions.

This one isn’t free though, I’m afraid – it’s normally £425. Hardly pocket money, but pretty good value for three full days of deep-dive sessions. However, there are two bits of good news:

  • Readers of this blog can get £50 off using the promo code "PROGNET50" at the checkout.
  • I have two free tickets to give away.

In an effort to make the ticket give-away fair, I’m thinking of a 32-bit number – mail me (skeet@pobox.com) an Int32, and the two readers with the closest value will get the tickets. Please include "Progressive .NET" in the subject line of the mail so I can filter them easily :)

Anyway, hope to see you there – please grab me to say hi.

Update (August 4th): and the winners are…

Congratulations to The Configurator and Haris Hasan who submitted the closest numbers to the one I was thinking of: -890978631.

In fact, The Configurator guessed the exact value – which is the result of calling "Progressive .NET".GetHashCode() on my 32-bit laptop running .NET 4. (I can’t remember which versions have different hash algorithms, but as it’s pretty arbitrary, it seemed good enough…) I’m impressed!

I’ll be emailing SkillsMatter to let them know about the winners – and thanks to everyone else who mailed me a guess. Hope I’ll see some of you there anyway!

Eduasync part 14: Data passing in coroutines

(This post covers project 19 in the source code.)

Last time we looked at independent coroutines running in a round-robin fashion. This time we’ll keep the round-robin scheduling, but add in the idea of passing data from one coroutine to another. Each coroutine will act on data of the same type, which is necessary for the scheme to work when one coroutine could "drop out" of the chain by returning.

Designing the data flow

It took me a while to get to the stage where I was happy with the design of how data flowed around these coroutines. I knew I wanted a coordinator as before, and that it should have a Yield method taking the value to pass to the next coroutine and returning an awaitable which would provide the next value when it completed. The tricky part was working out what to do at the start of each method and the end. If the method just took a Coordinator parameter, we wouldn’t have anything to do with the value yielded by the first coroutine, because the second coroutine wouldn’t be ready to accept it yet. Likewise when a coroutine completed, we wouldn’t have another value to pass to the next coroutine.

Writing these dilemmas out in this post, the solution seems blindingly obvious of course: each coroutine should accept a data value on entry, and return one at the end. At any point where we transfer control, we provide a value and have a value which is required by something. The final twist is to make the coordinator’s Start method take an initial value and return the value returned by the last coroutine to complete.

So, that’s the theory… let’s look at the implementation.

Initialization

I’ve changed the coordinator to take all the coroutines as a constructor parameter (of the somewhat fearsome declaration "params Func<Coordinator<T>, T, Task<T>>[] coroutines") which means we don’t need to implement IEnumerable pointlessly any more.

This leads to a code skeleton of this form:

private static void Main(string[] args)
{
    var coordinator = new Coordinator<string>(FirstCoroutine,
                                              SecondCoroutine,
                                              ThirdCoroutine);
    string finalResult = coordinator.Start("m1");
    Console.WriteLine("Final result: {0}", finalResult);
}

private static async Task<string> FirstCoroutine(
    Coordinator<string> coordinator,
    string initialValue)
{
    …
}

// Same signature for SecondCoroutine and ThirdCoroutine

Last time we simply had a Queue<Action> internally in the coordinator as the actions to invoke. You might be expecting a Queue<Func<T, T>> this time – after all, we’re passing in data and returning data at each point. However, the mechanism for that data transfer is "out of band" so to speak. The only time we really "return" an item is when we reach the end of a coroutine. Usually we’ll be providing data to the next step using a method. Likewise the only time a coroutine is given data directly is in the first call – after that, it will have to fetch the value by calling GetResult() on the awaiter which it uses to yield control.

All of this is leading to a requirement for our constructor to convert each coroutine delegate into a simple Action. The trick is working out how to deal with the data flow. I’m going to include SupplyValue() and ConsumeValue() methods within the coordinator for the awaiter to use, so it’s just a case of calling those appropriately from our action. In particular:

  • When the action is called, it should consume the current value.
  • It should then call the coroutine passing in the coordinator ("this") and the initial value.
  • When the task returned by the coroutine has completed, the result of that task should be used to supply a new value.

The only tricky part here is the last bullet – and it’s not that hard really, so long as we remember that we’re absolutely not trying to start any new threads. We just want to hook onto the end of the task, getting a chance to supply the value before the next coroutine tries to pick it up. We can do that using Task.ContinueWith, but passing in TaskContinuationOptions.ExecuteSynchronously so that we use the same thread that the task completes on to execute the continuation.

At this point we can implement the initialization part of the coordinator, assuming the presence of SupplyValue() and ConsumeValue():

public sealed class Coordinator<T>
{
    private readonly Queue<Action> actions;
    private readonly Awaitable awaitable;

    public Coordinator(params Func<Coordinator<T>, T, Task<T>>[] coroutines)
    {
        // We can’t refer to "this" in the variable initializer. We can use
        // the same awaitable for all yield calls.
        this.awaitable = new Awaitable(this);
        actions = new Queue<Action>(coroutines.Select(ConvertCoroutine));
    }

    // Converts a coroutine into an action which consumes the current value,
    // calls the coroutine, and attaches a continuation to it so that the return
    // value is used as the new value.
    private Action ConvertCoroutine(Func<Coordinator<T>, T, Task<T>> coroutine)
    {
        return () =>
        {
            Task<T> task = coroutine(this, ConsumeValue());
            task.ContinueWith(ignored => SupplyValue(task.Result),
                TaskContinuationOptions.ExecuteSynchronously);
        };
    }
}

I’ve broken ConvertCoroutine into a separate method so that we can use it as the projection for the Select call within the constructor. I did initially have it within a lambda expression within the constructor, but it was utterly hideous in terms of readabililty.

One suggestion I’ve received is that I could declare a new delegate type instead of using Func<Coordinator<T>, T, Task<T>> to represent a coroutine. This could either be a non-generic delegate nested in the generic coordinator class, or a generic stand-alone delegate:

public delegate T Coroutine<T>(Coordinator<T> coordinator, T initialValue);

// Or nested…
public sealed class Coordinator<T>
{
    public delegate T Coroutine(Coordinator<T> coordinator, T initialValue);
}

Both of these would work perfectly well. I haven’t made the change at the moment, but it’s certainly worth considering. The debate about whether to use custom delegate types or Func/Action is one for another blog post, I think :)

The one bit of the initialization I haven’t explained yet is the "awaitable" field and the Awaitable type. They’re to do with yielding – so let’s look at them now.

Yielding and transferring data

Next we need to work out how we’re going to transfer data and control between the coroutines. As I’ve mentioned, we’re going to use a method within the coordinator, called from the coroutines, to accomplish this. The coroutines have this sort of code:

private static async Task<string> FirstCoroutine(
    Coordinator<string> coordinator,
    string initialValue)
{
    Console.WriteLine("Starting FirstCoroutine with initial value {0}",
                      initialValue);            
    …
    string received = await coordinator.Yield("x1");
    Console.WriteLine("Returned to FirstCoroutine with value {0}", received);
    …
    return "x3";
}

The method name "Yield" here is a double-edged sword. The word has two meanings – yielding a value to be used elsewhere, and yielding control until we’re called back. Normally it’s not ideal to use a name that can mean subtly different things – but in this case we actually want both of these meanings.

So, what does Yield need to do? Well, the flow control should look something like this:

  • Coroutine calls Yield()
  • Yield() calls SupplyValue() internally to remember the new value to be consumed by the next coroutine
  • Yield() returns an awaitable to the coroutine
  • Due to the await expression, the coroutine calls GetAwaiter() on the awaitable to get an awaiter
  • The coroutine checks IsCompleted on the awaiter, which must return false (to prompt the remaining behaviour)
  • The coroutine calls OnCompleted() passing in the continuation for the rest of the method
  • The coroutine returns to its caller
  • The coordinator proceeds with the next coroutine
  • When we eventually get back to this coroutine, it will call GetResult() to get the "current value" to assign to the "received" variable.

Now you’ll see that Yield() needs to return some kind of awaitable type – in other words, one with a GetAwaiter() method. Previously we put this directly on the Coordinator type, and we could have done that here – but I don’t really want anyone to just "await coordinator" accidentally. You should really need to call Yield() in order to get an awaitable. So we have an Awaitable type, nested in Coordinator.

We then need to decide what the awaiter type is – the result of calling GetAwaiter() on the awaitable. This time I decided to use the Coordinator itself. That means people could accidentally call IsCompleted, OnCompleted() or GetResult(), but I figured that wasn’t too bad. If we were to go to the extreme, we’d create another type just for the Awaiter as well. It would need to have a reference to the coordinator of course, in order to actually do its job. As it is, we can make the Awaitable just return the Coordinator that created it. (Awaitable is nested within Coordinator<T>, which is how it can refer to T without being generic itself.)

public sealed class Awaitable
{
    private readonly Coordinator<T> coordinator;

    internal Awaitable(Coordinator<T> coordinator)
    {
        this.coordinator = coordinator;
    }

    public Coordinator<T> GetAwaiter()
    {
        return coordinator;
    }
}

The only state here is the coordinator, which is why we create an instance of Awaitable on the construction of the Coordinator, and keep it around.

Now Yield() is really simple:

public Awaitable Yield(T value)
{
    SupplyValue(value);
    return awaitable;
}

So to recap, we now just need the awaiter members, SupplyValue() and ConsumeValue(). Let’s look at the awaiter members (in Coordinator) to start with. We already know that IsCompleted will just return false. OnCompleted() just needs to stash the continuation in the queue, and GetResult() just needs to consume the "current" value and return it:

public bool IsCompleted { get { return false; } }

public void OnCompleted(Action continuation)
{
    actions.Enqueue(continuation);
}

public T GetResult()
{
    return ConsumeValue();
}

Simple, huh? Finally, consuming and supplying values:

private T currentValue;
private bool valuePresent;

private void SupplyValue(T value)
{
    if (valuePresent)
    {
        throw new InvalidOperationException
            ("Attempt to supply value when one is already present");
    }
    currentValue = value;
    valuePresent = true;
}

private T ConsumeValue()
{
    if (!valuePresent)
    {
        throw new InvalidOperationException
            ("Attempt to consume value when it isn’t present");
    }
    T oldValue = currentValue;
    valuePresent = false;
    currentValue = default(T);
    return oldValue;
}

These are relatively long methods (compared with the other ones I’ve shown) but pretty simple. Hopefully they don’t need explanation :)

The results

Now that everything’s in place, we can run it. I haven’t posted the full code of the coroutines, but you can see it on Google Code. Hopefully the results speak for themselves though – you can see the relevant values passing from one coroutine to another (and in and out of the Start method).

Starting FirstCoroutine with initial value m1
Yielding ‘x1’ from FirstCoroutine…
    Starting SecondCoroutine with initial value x1
    Starting SecondCoroutine
    Yielding ‘y1’ from SecondCoroutine…
        Starting ThirdCoroutine with initial value y1
        Yielding ‘z1’ from ThirdCoroutine…
Returned to FirstCoroutine with value z1
Yielding ‘x2’ from FirstCoroutine…
    Returned to SecondCoroutine with value x2
    Yielding ‘y2’ from SecondCoroutine…
        Returned to ThirdCoroutine with value y2
        Finished ThirdCoroutine…
Returned to FirstCoroutine with value z2
Finished FirstCoroutine
    Returned to SecondCoroutine with value x3
    Yielding ‘y3’ from SecondCoroutine…
    Returned to SecondCoroutine with value y3
    Finished SecondCoroutine
Final result: y4

Conclusion

I’m not going to claim this is the world’s most useful coroutine model – or indeed useful at all. As ever, I’m more interested in thinking about how data and control flow can be modelled than actual usefulness.

In this case, it was the realization that everything should accept and return a value of the same type which really made it all work. After that, the actual code is pretty straightforward. (At least, I think it is – please let me know if any bits are confusing, and I’ll try to elaborate on them.)

Next time we’ll look at something more like a pipeline model – something remarkably reminiscent of LINQ, but without taking up as much stack space (and with vastly worse readability, of course). Unfortunately the current code reaches the limits of my ability to understand why it works, which means it far exceeds my ability to explain why it works. Hopefully I can simplify it a bit over the next few days.