Category Archives: C#

How can I enumerate thee? Let me count the ways…

This weekend, I was writing some demo code for the async chapter of C# in Depth – the idea was to decompile a simple asynchronous method and see what happened. I received quite a surprise during this, in a way which had nothing to do with asynchrony.

Given that at execution time, text refers to an instance of System.String, and assuming nothing in the body of the loop captures the ch variable, how would you expect the following loop to be compiled?

foreach (char ch in text)
{
    // Body here
}

Before today, I could think of four answers depending on the compile-time type of text, assuming it compiled at all. One of those answers is if text is declared to be dynamic, which I’m not going to go into here. Let’s stick with static typing.

If text is declared as IEnumerable

In this case, the compiler can only use the non-generic IEnumerator interface, and I’d expect the code to be roughly equivalent to this:

IEnumerator iterator = text.GetEnumerator();
try
{
    while (iterator.MoveNext())
    {
        char ch = (char) iterator.Current;
        // Body here
    }
}
finally
{
    IDisposable disposable = iterator as IDisposable;
    if (disposable != null)
    { 
        disposable.Dispose();
    }
}

Note how the disposal of the iterator has to be conditional, as IEnumerator doesn’t extend IDisposable.

If text is declared as IEnumerable<char>

Here, we don’t need any execution time casting, and the disposal can be unconditional:

IEnumerator<char> iterator = text.GetEnumerator();
try
{
    while (iterator.MoveNext())
    {
        char ch = iterator.Current;
        // Body here
    }
}
finally
{
    iterator.Dispose();
}

If text is declared as string

Now things get interesting. System.String implements IEnumerable<char> using explicit interface implementation, and exposes a separate public GetEnumerator() method which is declared to return a CharEnumerator.

Usually when I find a type doing this sort of thing, it’s for the sake of efficiency, to reduce heap allocations. For example, List<T>.GetEnumerator returns a List<T>.Enumerator which is struct with the appropriate iteration members. This means if you use foreach over an expression of type List<T>, the iterator can stay on the stack in most cases, saving object allocation and garbage collection.

In this case, however, I suspect CharEnumerator was introduced (way back in .NET 1.0) to avoid having to box each character in the string. This was one reason for foreach handling to be based on types obeying the enumerable pattern, as well as there being support through the normal interfaces. It strikes me that it could still have been a structure in the same way as for List<T>, but maybe that wasn’t considered as an option.

Anyway, it means that I would have expected the code to be compiled like this, even back to C# 1:

CharEnumerator iterator = text.GetEnumerator();
try
{
    while (iterator.MoveNext())
    {
        char ch = iterator.Current;
        // Body here
    }
}
finally
{
    iterator.Dispose();
}

What really happens when text is declared as string

(This is the bit that surprised me.)

So far, I’ve been assuming that the C# compiler doesn’t have any special knowledge about strings, when it comes to iteration. I knew it did for arrays, but that’s all. The actual result – under the C# 5 compiler, at least – is to use the Length property and the indexer directly:

int index = 0;
while (index < text.Length)
{
    char ch = text[index];
    index++;
    // Body here
}

There’s no heap allocation, and no Dispose call. If the variable in question can change its value within the loop (e.g. if it’s a field, or a captured variable, or there’s an assignment to it within the body) then a copy is made of the variable value (just a reference, of course) first, so that all member access is performed on the same object.

Conclusion

So, there we go. There’s nothing particularly mind-blowing here – certainly nothing to affect your programming style, unless you were deliberately avoiding using foreach on strings "because it’s slow." It’s still a good lesson in not assuming you know what the compiler is going to do though… so long as the results are as expected, I’m very happy for them to put extra smarts in, even if it does mean having to change my C# in Depth sample code a bit…

The future of “C# in Depth”

I’m getting fairly frequent questions – mostly on Twitter – about whether there’s going to be a third edition of C# in Depth. I figure it’s worth answering it once in some detail rather than repeatedly in 140 characters ;)

I’m currently writing a couple of new chapters covering the new features in C# 5 – primarily async, of course. The current "plan" is that these will be added to the existing 2nd edition to create a 3rd edition. There will be minimal changes to the existing text of the 2nd edition – basically going over the errata and editing a few places which ought to mention C# 5 early. (In particular the changes to how foreach loop variables are captured.)

So there will definitely be new chapters. I’m hoping there’ll be a full new print (and ebook of course) edition, but no contracts have been signed yet. I’m hoping that the new chapters will be provided free electronically to anyone who’s already got the ebook of the 2nd edition – but we’ll see. Oh, and I don’t have any timelines at the moment. Work is more demanding than it was when I was writing the first and second editions, but obviously I’ll try to get the job done at a reasonable pace. (Writing about async in a way which is both accessible and accurate is really tricky, by the way.)

Of course when I’ve finished those, I’ve got two other C# books I want to be writing… when I’m not working on Noda Time, Tekpub screencasts etc…

Update

I had a question on Twitter around the "two other C# books". I don’t want to go into too many details – partly because they’re very likely to change – but my intention is to write "C# from Scratch" and "C# in Style". The first would be for complete beginners; the second wouldn’t go into "how things work" so much as "how to use the language most effectively." (Yes, competition for Effective C#.) One possibility is that both would be donationware, at least in ebook form, ideally with community involvement in terms of public comments.

I’m hoping that both will use the same codebase as an extended example, where "From Scratch" would explain what the code does, and "In Style" would explain why I chose that approach. Oh, and "From Scratch" would use unit testing as a teaching tool wherever possible, attempting to convey the idea that it’s something every self-respecting dev does :)

The perils of conditional mutability

This morning I was wrestling with trying to make some Noda Time unit tests faster. For some reason, the continuous integration host we’re using is really slow at loading resources under .NET 4. The unit tests which run in 10 seconds on my home laptop take over three hours on the continuous integration system. Taking stack traces at regular intervals showed the problem was with the NodaFormatInfo constructor, which reads some resources.

I may look into streamlining the resource access later, but before we get to that point, I wanted to try to reduce the number of times we call that constructor in the first place. NodaFormatInfo is meant to be cached, so I wouldn’t have expected thousands of instances to be created – but it’s only cached when the System.Globalization.CultureInfo it’s based on is read-only. This is where the problems start…

CultureInfo is conditionally mutable (not an official term, just one I’ve coined for the purposes of this post). You can ask whether or not it’s read-only with the IsReadOnly property, and obviously if it’s read-only you can’t change it. Additionally, CultureInfo is composed of other conditionally mutable objects – DateTimeFormatInfo, NumberFormatInfo etc. There’s a static ReadOnly method on CultureInfo to create a read-only wrapper around a mutable CultureInfo. It’s not clearly documented whether that’s expected to take a deep copy (so that callers can really rely on it not changing) or whether it’s expected to reflect any further changes made to the culture info it’s based on. To go in the other direction, you can call Clone on a CultureInfo to create a mutable copy of any existing culture.

Further complications are introduced by the properties on the composite objects – we have properties such as DateTimeFormatInfo.MonthNames which returns a string array. Remember, arrays are always mutable. So it’s really important to know whether the array reference returned from the property refers to a copy of the underlying data, or whether it refers to the array that’s used internally by the type. Obviously for read-only DateTimeFormatInfo objects, I’d expect a copy to be returned – but for a mutable DateTimeFormatInfo, it would potentially make sense to return the underlying array reference. Unfortunately, the documentation doesn’t make this clear – but in practice, it always returns a copy. If you need to change the month names, you need to clone the array, mutate the clone, and then set the MonthNames property.

All of this makes CultureInfo hard to work with. The caching decision earlier on only really works if a "read-only" culture genuinely won’t change behind the scenes. The type system gives you no help to catch subtle bugs at compile-time. Making any of this robust but efficient (in terms of taking minimal copies) is tricky to say the least.

Not only does it make it hard to work with from a client’s point of view, but apparently it’s hard to implement correctly too…

First bug: Mono’s invariant culture isn’t terribly invariant…

(Broken in 2.10.8; apparently fixed later.)

I discovered this while getting Noda Time’s unit tests to pass on Mono. Unfortunately there are some I’ve had to effectively disable at the moment, due to deficiencies in Mono (some of which are being fixed, of course).

Here’s a program which builds a clone of the invariant culture, changes the clone’s genitive month names, and then prints out the first non-genitive name from the plain invariant culture:

using System;
using System.Globalization;

class Test
{
    static void Main()
    {        
        CultureInfo clone = (CultureInfo) CultureInfo.InvariantCulture.Clone();
        // Note: not even deliberately changing MonthNames for this culture!
        clone.DateTimeFormat.MonthGenitiveNames[0] = "Changed";
        
        // Prints Changed
        Console.WriteLine(CultureInfo.InvariantCulture.DateTimeFormat.MonthNames[0]);
    }
}

I believe this bug is really due to the lack of support for genitive month names in Mono at the moment – the MonthGenitiveNames property always just returns a reference to the month names for the invariant culture – without taking a copy first. (It always returns the invariant culture’s month names, even if you’re using a different culture entirely.) The code above shows an "innocent" attempt to change a mutable clone – but in reality we could have used any culture (including an immutable one) to make the change.

Note that in the .NET implementation, the change would only have been to a copy of the underlying data, so even the clone wouldn’t have reflected change anywhere.

Second bug: ReadOnly losing changes

The second bug is the one I found this morning. It looks like it’s fixed in .NET 4, but it’s present in .NET 3.5, which is where it bit me this morning. When you try to make a read-only wrapper around a mutated culture, some of the properties are preserved… but some aren’t:

using System;
using System.Globalization;

class Test
{
    static void Main()
    {
        CultureInfo clone = (CultureInfo) CultureInfo.InvariantCulture.Clone();
        clone.DateTimeFormat.AMDesignator = "ChangedAm";

        // The array is recreated on each call to MonthNames, so changing the
        // value within the array itself doesn’t work :(
        string[] months = (string[]) clone.DateTimeFormat.MonthNames;
        months[0] = "ChangedMonth";
        clone.DateTimeFormat.MonthNames = months;
        
        CultureInfo readOnlyCopy = CultureInfo.ReadOnly(clone);
        Console.WriteLine(clone.DateTimeFormat.AMDesignator); // ChangedAm
        Console.WriteLine(clone.DateTimeFormat.MonthNames[0]); // ChangedMonth
                
        Console.WriteLine(readOnlyCopy.DateTimeFormat.AMDesignator); // ChangedAm
        Console.WriteLine(readOnlyCopy.DateTimeFormat.MonthNames[0]); // January (!)
    }
}

I don’t know what’s going on here. In the end I just left the test code using the mutable clone, having added a comment explaining why it wasn’t created a read-only wrapper.

I’ve experimented with a few different options here, including setting the MonthNames property on the clone after creating the wrapper. No joy – I simply can’t make the new month names stick in the read-only copy. <sigh>

Conclusion

I’ve been frustrated by the approach we’ve taken to cultures in Noda Time for a while. I haven’t worked out exactly what we should do about it yet, so it’s likely to stay somewhat annoying for v1, but I may well revisit it significantly for v2. Unfortunately, there’s nothing I can do about CultureInfo itself.

What I would have preferred in all of this is the builder pattern: make CultureInfo, DateTimeFormatInfo etc all immutable, but give each of them mutable builder types, with the ability to create a mutable builder based on an existing immutable instance, and obviously to create a new immutable instance from a builder. That would make all kinds of things simpler – including caching.

For the moment though, I hope we can all learn lessons from this – or have old lessons reinforced, at least:

  • Making a single type behave in different ways based on different "modes" makes it hard to use correctly. (Yes, this is the same first conclusion as with DateTime in the previous post. Funny, that.)
  • Immutability has to be deep to be meaningful: it’s not much use having a supposedly read-only object which composes a StringBuilder…
  • Arrays should be considered somewhat harmful. If you’re going to return an array from a method, make sure you document whether this is a copy of the underlying data, or a "live" reference. (The latter should be very rare, particularly for a public API.) The exception here is if you return an empty array – that’s effectively immutable, so you can always return it with no problems.
  • The builder pattern rocks – use it!

In my next post I’ll try to get back to the TimeZoneInfo oddities I mentioned last time.

More fun with DateTime

(Note that this is deliberately not posted in the Noda Time blog. I reckon it’s of wider interest from a design perspective, and I won’t be posting any of the equivalent Noda Time code. I’ll just say now that we don’t have this sort of craziness in Noda Time, and leave it at that…)

A few weeks ago, I was answering a Stack Overflow question when I noticed an operation around dates and times which should have been losing information apparently not doing so. I investigated further, and discovered some "interesting" aspects of both DateTime and TimeZoneInfo. In an effort to keep this post down to a readable length (at least for most readers; certain WebDriver developers who shall remain nameless have probably given up by now already) I’ll save the TimeZoneInfo bits for another post.

Background: daylight saving transitions and ambiguous times

There’s one piece of inherent date/time complexity you’ll need to understand for this post to make sense: sometimes, a local date/time occurs twice. For the purposes of this post, I’m going to assume you’re in the UK time zone. On October 28th 2012, at 2am local time (1am UTC), UK clocks will go back to 1am local time. So 1:20am local time occurs twice – once at 12:20am UTC (in daylight saving time, BST), and once at 1:20am UTC (in standard time, GMT).

If you want to run any of the code in this post and you’re not in the UK, please adjust the dates and times used to a similar ambiguity for when your clocks go back. If you happen to be in a time zone which doesn’t observe daylight savings, I’m afraid you’ll have to adjust your system time zone in order to see the effect for yourself.

DateTime.Kind and conversions

As you may already know, as of .NET 2.0, DateTime has a Kind property, of type DateTimeKind – an enum with the following values:

  • Local: The DateTime is considered to be in the system time zone. Not an arbitrary "local time in some time zone", but in the specific current system time zone.
  • Utc: The DateTime is considered to be in UTC (corollary: it always unambiguously represents an instant in time)
  • Unspecified: This means different things in different contexts, but it’s a sort of "don’t know" kind; this is closer to "local time in some time zone" which is represented as LocalDateTime in Noda Time.

DateTime provides three methods to convert between the kinds:

  • ToUniversalTime: if the original kind is Local or Unspecified, convert it from local time to universal time in the system time zone. If the original kind is Utc, this is a no-op.
  • ToLocalTime: if the original kind is Utc or Unspecified, convert it from UTC to local time. If the original kind is Local, this is a no-op.
  • SpecifyKind: keep the existing date/time, but just change the kind. (So 7am stays as 7am, but it changes the meaning of that 7am effectively.)

(Prior to .NET 2.0, ToUniversalTime and ToLocalTime were already present, but always assumed the original value needed conversion – so if you called x.ToLocalTime().ToLocalTime().ToLocalTime() the result would probably end up with the appropriate offset from UTC being applied three times!)

Of course, none of these methods change the existing value – DateTime is immutable, and a value type – instead, they return a new value.

DateTime’s Deep Dark Secret

(The code in this section is presented in several chunks, but it forms a single complete piece of code – later chunks refer to variables in earlier chunks. Put it all together in a Main method to run it.)

Armed with the information in the previous sections, we should be able to make DateTime lose data. If we start with 12:20am UTC and 1:20am UTC on October 28th as DateTimes with a kind of Utc, when we convert them to local time (on a system in the UK time zone) we should get 1:20am in both cases due to the daylight saving transition. Indeed, that works:

// Start with different UTC values around a DST transition
var original1 = new DateTime(2012, 10, 28, 0, 20, 0, DateTimeKind.Utc);
var original2 = new DateTime(2012, 10, 28, 1, 20, 0, DateTimeKind.Utc);

// Convert to local time
var local1 = original1.ToLocalTime();
var local2 = original2.ToLocalTime();

// Result is the same for both values. Information loss?
var expected = new DateTime(2012, 10, 28, 1, 20, 0, DateTimeKind.Local);
Console.WriteLine(local1 == expected); // True
Console.WriteLine(local2 == expected); // True
Console.WriteLine(local1 == local2);   // True

If we’ve started with two different values, applied the same operation to both, and ended up with equal values, then we must have lost information, right? That doesn’t mean that operation is "bad" any more than "dividing by 2" is bad. You ought to be aware of that information loss, that’s all.

So, we ought to be able to demonstrate that information loss further by converting back from local time to universal time. Here we have the opposite problem: from our local time of 1:20am, we have two valid universal times we could convert to – either 12:20am UTC or 1:20am UTC. Both answers would be correct – they are universal times at which the local time would be 1:20am. So which one will get picked? Well… here’s the surprising bit:

// Convert back to UTC
var roundTrip1 = local1.ToUniversalTime(); 
var roundTrip2 = local2.ToUniversalTime();

// Values round-trip correctly! Information has been recovered…
Console.WriteLine(roundTrip1 == original1);  // True
Console.WriteLine(roundTrip2 == original2);  // True
Console.WriteLine(roundTrip1 == roundTrip2); // False

Somehow, each of the local values knows which universal value it came from. The The information has been recovered, so the reverse conversion round-trips each value back to its original one. How is that possible?

It turns out that DateTime actually has four potential kinds: Local, Utc, Unspecified, and "local but treat it as the earlier option when resolving ambiguity". A DateTime is really just a 64-bit number of ticks, but because the range of DateTime is only January 1st 0001 to December 31st 9999. That range can be represented in 62 bits, leaving 2 bits "spare" to represent the kind. 2 bits gives 4 possible values… the three documented ones and the shadowy extra one.

Through experimentation, I’ve discovered that the kind is preserved if you perform arithmetic on the value, too… so if you go to another "fall back" DST transition such as October 30th 2011, the ambiguity resolution works the same way as before:

var local3 = local1.AddYears(-1).AddDays(2); 
var local4 = local2.AddYears(-1).AddDays(2);        
Console.WriteLine(local3.ToUniversalTime().Hour); // 0
Console.WriteLine(local4.ToUniversalTime().Hour); // 1

If you use DateTime.SpecifyKind with DateTimeKind.Local, however, it goes back to the "normal" kind, even though it looks like it should be a no-op:

// Should be a no-op?
var local5 = DateTime.SpecifyKind(local1, local1.Kind); 
Console.WriteLine(local5.ToUniversalTime().Hour); // 1

Is this correct behaviour? Or should it be a no-op, just like calling ToLocalTime on a "local" DateTime is? (Yes, I’ve checked – that doesn’t lose the information.) It’s hard to say, really, as this whole business appears to be undocumented… at least, I haven’t seen anything in MSDN about it. (Please add a link in the comments if you find something. The behaviour actually goes against what’s documented, as far as I can tell.)

I haven’t looked into whether various forms of serialization preserve values like this faithfully, by the way – but you’d have to work hard to reproduce it in non-framework code. You can’t explicitly construct a DateTime with the "extra" kind; the only ways I know of to create such a value are via a conversion to local time or through arithmetic on a value which already has the kind. (Admittedly if you’re serializing a DateTime with a Kind of Local, you’re already on potentially shaky ground, given that you could be deserializing it on a machine with a different system time zone.)

Unkind comparisons

I’ve misled you a little, I have to admit. In the code above, when I compared the "expected" value with the results of the first conversions, I deliberately specified DateTimeKind.Local in the constructor call. After all, that’s the kind we do expect. Well, yes – but I then printed the result of comparing this value with local1 and local2… and those comparisons would have been the same regardless of the kind I’d specified in the constructor.

All comparisons between DateTimes ignore the Kind property. It’s not just restricted to equality. So for example, consider this comparison:

// In June: Local time is UTC+1, so 8am UTC is 9am local
var dt1 = new DateTime(2012, 6, 1, 8, 0, 0, DateTimeKind.Utc); 
var dt2 = new DateTime(2012, 6, 1, 8, 30, 0, DateTimeKind.Local); 
Console.WriteLine(dt1 < dt2); // True

When viewed in terms of "what instants in time do these both represent?" the answer here is wrong – when you convert both values into the same time zone (in either direction), dt1 occurs after dt2. But a simple look at the properties tells a different story. In practice, I suspect that most comparisons between DateTime values of different kinds involve code which is at best sloppy and is quite possibly broken in a meaningful way.

Of course, if you bring Kind=Unspecified into the picture, it becomes impossible to compare meaningfully in a kind-sensitive way. Is 12am UTC before or after 1am Unspecified? It depends what time zone you later use.

To be clear, it is a hard-to-resolve issue, and one that we don’t do terribly well at in Noda Time at the moment for ZonedDateTime. (And even with just LocalDateTime you’ve got issues between calendars.) This is a situation where providing separate Comparer<T> implementations works nicely – so you can explicitly say what kind of comparison you want.

Conclusions

There’s more fun to be had with a similar situation when we look at TimeZoneInfo, but for now, a few lessons:

  • Giving a type different "modes" which make it mean fairly significantly different things is likely to cause headaches
  • Keeping one of those modes secret (and preventing users from even constructing a value in that mode directly) leads to even more fun and games
  • If two instances of your type are considered "equal" but behave differently, you should at least consider whether there’s something smelly going on
  • There’s always more fun to be had with DateTime…

Type initializer circular dependencies

To some readers, the title of this post may induce nightmarish recollections of late-night debugging sessions. To others it may be simply the epitome of jargon. Just to break the jargon down a bit:

  • Type initializer: the code executed to initialize the static variables of a class, and the static constructor
  • Circular dependency: two bits of code which depend on each other – in this case, two classes whose type initializers each require that the other class is initialized

A quick example of the kind of problem I’m talking about would be helpful here. What would you expect this code to print?

using System;

class Test
{    
    static void Main()
    {
        Console.WriteLine(First.Beta);
    }
}

class First
{
    public static readonly int Alpha = 5;
    public static readonly int Beta = Second.Gamma;
}

class Second
{
    public static readonly int Gamma = First.Alpha;
}

Of course, without even glancing at the specification, any expectations are pretty irrelevant. Here’s what the spec (section 10.5.5.1 of the C# 4 version):

The static field variable initializers of a class correspond to a sequence of assignments that are executed in the textual order in which they appear in the class declaration. If a static constructor (§10.12) exists in the class, execution of the static field initializers occurs immediately prior to executing that static constructor. Otherwise, the static field initializers are executed at an implementation-dependent time prior to the first use of a static field of that class.

In addition to the language specification, the CLI specification gives more details about type initialization in the face of circular dependencies and multiple threads. I won’t post the details here, but the gist of it is:

  • Type initialization acts like a lock, to prevent more than one thread from initializing a type
  • If the CLI notices that type A needs to be initialized in order to make progress, but it’s already in the process of initializing type A in the same thread, it continues as if the type were already initialized.

So here’s what you might expect to happen:

  1. Initialize Test: no further action required
  2. Start running Main
  3. Start initializing First (as we need First.Beta)
  4. Set First.Alpha to 5
  5. Start initializing Second (as we need Second.Gamma)
  6. Set Second.Gamma to First.Alpha (5)
  7. End initializing Second
  8. Set First.Beta to Second.Gamma (5)
  9. End initializing First
  10. Print 5

Here’s what actually happens – on my box, running .NET 4.5 beta. (I know that type initialization changed for .NET 4, for example. I don’t know of any changes for .NET 4.5, but I’m not going to claim it’s impossible.)

  1. Initialize Test: no further action required
  2. Start running Main
  3. Start initializing First (as we need First.Beta)
  4. Start initializing Second (we will need Second.Gamma)
  5. Set Second.Gamma to First.Alpha (0)
  6. End initializing Second
  7. Set First.Alpha to 5
  8. Set First.Beta to Second.Gamma (0)
  9. End initializing First
  10. Print 0

Step 5 is the interesting one here. We know that we need First to be initialized, in order to get First.Alpha, but this thread is already initializing First (we started in step 3) so we just access First.Alpha and hope that it’s got the right value. As it happens, that variable initializer hasn’t been executed yet. Oops.

(One subtlety here is that I could have declared all these variables as constants instead using "const" which would have avoided all these problems.)

Back in the real world…

Hopefully that example makes it clear why circular dependencies in type initializers are nasty. They’re hard to spot, hard to debug, and hard to test. Pretty much your classic Heisenbug, really. It’s important to note that if the program above had happened to initialize Second first (to access a different variable, for example) we could have ended up with a different result. In particular, it’s easy to get into a situation where running all your unit tests can cause a failure – but if you run just the failing test, it passes.

One way of avoiding all of this is never to use any type initializers for anything, of course. In many cases that’s exactly the right solution – but often there are natural uses, particularly for well-known values such as Encoding.Utf8, TimeZoneInfo.Utc and the like. Note that in both of those cases they are static properties, but I would expect them to be backed by static fields. I’m somewhat ambivalent between using public static readonly fields and public static get-only properties – but as we’ll see later, there’s a definite advantage to using properties.

Noda Time has quite a few values like this – partly because so many of its types are immutable. It makes sense to create a single UTC time zone, a single ISO calendar system, a single "pattern" (text formatter/parser) for each of a variety of common cases. In addition to the publicly visible values, there are various static variables used internally, mostly for caching purposes. All of this definitely adds complexity – and makes it harder to test – but the performance benefits can be significant.

Unfortunately, a lot of these values end up with fairly natural circular dependencies – as I discovered just recently, where adding a new static field caused all kinds of breakage. I was able to fix the immediate cause, but it left me concerned about the integrity of the code. I’d fixed the one failure I knew about – but what about any others?

Testing type initialization

One of the biggest issues with type initialization is the order-sensitivity – combined with the way that once a type has been initialized once, that’s it for that AppDomain. As I showed earlier, it’s possible that initializing types in one particular order causes a problem, but a different order won’t.

I’ve decided that for Noda Time at least, I want to be reasonably sure that type initialization circularity isn’t going to bite me. So I want to validate that no type initializers form cycles, whatever order the types are initialized in. Logically if we can detect a cycle starting with one type, we ought to be able to detect it starting with any of the other types in that cycle – but I’m sufficiently concerned about weird corner cases that I’d rather just take a brute force approach.

So, as a rough plan:

  • Start with an empty set of dependencies
  • For each type in the target assembly:
    • Create a new AppDomain
    • Load the target assembly into it
    • Force the type to be initialized
    • Take a stack trace at the start of each type initializer and record any dependencies
  • Look for cycles in the complete set of dependencies

Note that we’ll never spot a cycle within any single AppDomain, due to the way that type initialization works. We have to put together the results for multiple initialization sequences to try to find a cycle.

A description of the code would probably be harder to follow than the code itself, but the code is relatively long – I’ve included it at the end of this post to avoid intefering with the narrative flow. For more up-to-date versions in the future, look at the Noda Time repository.

This isn’t a terribly nice solution, for various reasons:

  • Creating a new AppDomain and loading assemblies into it from a unit test runner isn’t as simple as it might be. My code doesn’t currently work with NCrunch; I don’t know how it finds its assemblies yet. When I’ve fixed that, I’m sure other test runners would still be broken. Likewise I’ve had to explicitly filter types to get TeamCity (the continuous integration system Noda Time uses) to work properly. Currently, you’d need to edit the test code to change the filters. (It’s possible that there are better ways of filtering, of course.)
  • It relies on each type within the production code which has an "interesting" type initializer to have a line like this:
    private static readonly int TypeInitializationChecking = NodaTime.Utility.TypeInitializationChecker.RecordInitializationStart();
  • Not only does the previous line need to be added to the production code – it clearly gets executed each time, and takes up heap space per type. It’s only 4 bytes for each type involved, and it does no real work when we’re not testing, but it’s a nuisance anyway. I could use preprocessor directives to remove the code from non-debug or non-test-targeted builds, but that would look even messier.
  • It only picks up cycles which occur when running the version of .NET the tests happen to execute on. Given that there are ordering changes for different versions, I wouldn’t like to claim this is 100% bullet-proof. Likewise if there are only cycles when you’re running in some specific cultures (or other environmental features), it won’t necessarily pick those up.
  • I’ve deliberately not tried to make the testing code thread-safe. That’s fine in Noda Time – I don’t have any asynchronous operations or new threads in Noda Time at all – but other code may need to make this more robust.

So with all these caveats, is it still worth it? Absolutely: it’s already found bugs which have now been fixed.

In fact, the test didn’t get as far as reporting cycles to start with – it turned out that if you initialized one particular type first, the type initializer would fail with a NullReferenceException. Ouch! Once I’d fixed that, there were still quite a few problems to fix. Somewhat coincidentally, fixing them improved the design too – although the user-visible API didn’t change at all.

Fixing type initializer cycles

In the past, I’ve occasionally "fixed" type initialization ordering problems by simply moving fields around. The cycles still existed, but I figured out how to make them harmless. I can say now that this approach does not scale, and is more effort than it’s worth. The code ends up being brittle, hard to think about, and once you’ve got more than a couple of types involved it’s really error-prone, at least for my brain. It’s much better to break the cycle completely. To this end, I’ve ended up using a fairly simple technique to defer initialization of static variables. It’s a poor-man’s Lazy<T>, to some extent – but I’d rather not have to write Lazy<T> myself, and we’re currently targeting .NET 3.5…

Basically, instead of exposing a public static readonly field which creates the cycle, you expose a public static readonly property – which returns an internal static readonly field in a nested, private static class. We still get the nice thread-safe once-only initialization of a type initializer, but the nested type won’t be initialized until it needs to be. (In theory it could be initialized earlier, but a static constructor would ensure it isn’t.) So long as nothing within the rest of the type initializer for the containing class uses that property, we avoid the cycle.

So instead of this:

// Requires Bar to be initialized – if Bar also requires Foo to be
// initialized, we have a problem…
public static readonly Foo SimpleFoo = new Foo(Bar.Zero);

We might have:

public static readonly Foo SimpleFoo { get { return Constants.SimpleFoo; } }

private static class Constants
{
    private static readonly int TypeInitializationChecking = NodaTime.Utility.TypeInitializationChecker.RecordInitializationStart(); 

    // This requires both Foo and Bar to be initialized, but that’s okay
    // so long as neither of them require Foo.Constants to be initialized.
    // (The unit test would spot that.)
    internal static readonly Foo SimpleFoo = new Foo(Bar.Zero);
}

I’m currently undecided about whether to include static constructors in these classes to ensure lazy initialization. If the type initializer for Foo triggered the initializer of Foo.Constants, we’d be back to square one… but adding static constructors into each of these nested classes sounds like a bit of a pain. The nested classes should call into the type initialization checking as well, to validate they don’t cause any problems themselves.

Conclusion

I have to say, part of me really doesn’t like either the testing code or the workaround. Both smack of being clever, which is never a good thing. It’s definitely worth considering whether you could actually just get rid of the type initializer (or part of it) entirely, avoiding maintaining so much static state. It would be nice to be able to detect these type initializer cycles without running anything, simply using static analysis – I’m going to see whether NDepend could do that when I get a chance. The workaround doesn’t feel as neat as Lazy<T>, which is really what’s called for here – but I don’t trust myself to implement it correctly and efficiently myself.

So while both are somewhat hacky, they’re better than the alternative: buggy code. That’s what I’m ashamed to say I had in Noda Time, and I don’t think I’d ever have spotted all the cycles by inspection. It’s worth a try on your own code – see whether you’ve got problems lurking…

 

 

Appendix: Testing code

As promised earlier, here’s the code for the production and test classes.

TypeInitializationChecker

This is in NodaTime.dll itself.

internal sealed class TypeInitializationChecker : MarshalByRefObject
{
    private static List<Dependency> dependencies = null;

    private static readonly MethodInfo EntryMethod = typeof(TypeInitializationChecker).GetMethod("FindDependencies");

    internal static int RecordInitializationStart()
    {
        if (dependencies == null)
        {
            return 0;
        }
        Type previousType = null;
        foreach (var frame in new StackTrace().GetFrames())
        {
            var method = frame.GetMethod();
            if (method == EntryMethod)
            {
                break;
            }
            var declaringType = method.DeclaringType;
            if (method == declaringType.TypeInitializer)
            {
                if (previousType != null)
                {
                    dependencies.Add(new Dependency(declaringType, previousType));
                }
                previousType = declaringType;
            }
        }
        return 0;
    }

    /// <summary>
    /// Invoked from the unit tests, this finds the dependency chain for a single type
    /// by invoking its type initializer.
    /// </summary>
    public Dependency[] FindDependencies(string name)
    {
        dependencies = new List<Dependency>();
        Type type = typeof(TypeInitializationChecker).Assembly.GetType(name, true);
        RuntimeHelpers.RunClassConstructor(type.TypeHandle);
        return dependencies.ToArray();
    }

    /// <summary>
    /// A simple from/to tuple, which can be marshaled across AppDomains.
    /// </summary>
    internal sealed class Dependency : MarshalByRefObject
    {
        public string From { get; private set; }
        public string To { get; private set; }
        internal Dependency(Type from, Type to)
        {
            From = from.FullName;
            To = to.FullName;
        }
    }
}

TypeInitializationTest

This is within NodaTime.Test:

[TestFixture]
public class TypeInitializationTest
{
    [Test]
    public void BuildInitializerLoops()
    {
        Assembly assembly = typeof(TypeInitializationChecker).Assembly;
        var dependencies = new List<TypeInitializationChecker.Dependency>();
        // Test each type in a new AppDomain – we want to see what happens where each type is initialized first.
        // Note: Namespace prefix check is present to get this to survive in test runners which
        // inject extra types. (Seen with JetBrains.Profiler.Core.Instrumentation.DataOnStack.)
        foreach (var type in assembly.GetTypes().Where(t => t.FullName.StartsWith("NodaTime")))
        {
            // Note: this won’t be enough to load the assembly in all test runners. In particular, it fails in
            // NCrunch at the moment.
            AppDomainSetup setup = new AppDomainSetup { ApplicationBase = AppDomain.CurrentDomain.BaseDirectory };
            AppDomain domain = AppDomain.CreateDomain("InitializationTest" + type.Name, AppDomain.CurrentDomain.Evidence, setup);
            var helper = (TypeInitializationChecker)domain.CreateInstanceAndUnwrap(assembly.FullName,
                typeof(TypeInitializationChecker).FullName);
            dependencies.AddRange(helper.FindDependencies(type.FullName));
        }
        var lookup = dependencies.ToLookup(d => d.From, d => d.To);
        // This is less efficient than it might be, but I’m aiming for simplicity: starting at each type
        // which has a dependency, can we make a cycle?
        // See Tarjan’s Algorithm in Wikipedia for ways this could be made more efficient.
        // http://en.wikipedia.org/wiki/Tarjan’s_strongly_connected_components_algorithm
        foreach (var group in lookup)
        {
            Stack<string> path = new Stack<string>();
            CheckForCycles(group.Key, path, lookup);
        }
    }

    private static void CheckForCycles(string next, Stack<string> path, ILookup<string, string> dependencyLookup)
    {
        if (path.Contains(next))
        {
            Assert.Fail("Type initializer cycle: {0}-{1}", string.Join("-", path.Reverse().ToArray()), next);
        }
        path.Push(next);
        foreach (var candidate in dependencyLookup[next].Distinct())
        {
            CheckForCycles(candidate, path, dependencyLookup);
        }
        path.Pop();
    }
}

Eduasync 20: Changes between the VS11 Preview and the Visual Studio 11 Beta

A while I ago I blogged about what had changed under the hood of async between the CTP and the VS11 Preview. Well, now that the VS11 Beta is out, it’s time to do it all again…

Note that the code in this post is in the Eduasync codebase, under a different solution (Eduasync VS11.sln). Many of the old existing projects won’t compile with VS11 beta, but I’d rather leave them as they are for posterity, showing the evolution of the feature.

Stephen Toub has an excellent blog post covering some of this, so while I’ll mention things he’s covered, I won’t go into much detail about them. Let’s start off there though…

(EDIT: Stephen has also mailed me with some corrections, which I’ve edited in – mostly without indication, as the post has been up for less than seven hours, and it’ll make for a better reading experience.)

Awaiter pattern changes

The awaiter pattern is now not just a pattern. The IsCompleted property and GetResult method are still "loose" but OnCompleted is now part of an interface: INotifyCompletion. Awaiters have to implement INotifyCompleted, but may also implement ICriticalNotifyCompletion and its UnsafeOnCompleted method.

The OnCompleted method is just as it was before, and needs to flow the execuction context; the UnsafeOnCompleted method is simpler, as it doesn’t need to flow the execution context. All of this only matters if you’re implementing your own awaiters, of course. (More details in Stephen’s blog post. I’ve found this area somewhat confusing, so please do read his post carefully!)

Skeleton method changes

Just as I have previously, I’m using the (entirely unofficial) term "skeleton method" to mean the very short method created by the compiler with the same signature as an async method: this is the entry point to the async method, effectively, which creates and starts the state machine containing all the real logic.

There are two changes I’ve noticed in the skeleton method. Firstly, for some reason the state machine state numbers have changed. Whereas previously a state number of 0 meant "initial or running", positive values meant "between calls, or navigating back to the await expression" and -1 meant "finished", now -1 means "initial or running", non-negative means "between calls, or navigating back to await expression" and -2 means "finished". It’s not clear why this change has been made, given that it requires an extra assignment at the start of every skeleton method (to set the state to -1).

More importantly, the skeleton method no longer calls MoveNext directly on the state machine that it’s built. Instead, it calls Start<TStateMachine> on the AsyncTaskMethodBuilder<T> (or whichever method builder it’s using). It passes the state machine by reference (presumably for efficiency), and TStateMachine is constrained to implement the now-public-and-in-mscorlib IAsyncStateMachine interface. I’ll come back to the relationship between the state machine and the builder later on.

Task caching

(Code is in project 30: TaskCaching)

It’s possible for an async method to complete entirely synchronously. In this situation, the result is known before the method returns, and the task returned by the method is already in the RanToCompletionState. If two tasks have already run to completion with the same value, they can be (apparently) regarded as equivalent… so the beta now caches a task in this situation, for some types and values. (Apparently the preview cached too, but I hadn’t noticed, and the beta caches more.) According to my experiments and some comments:

  • For int, tasks with values -1 to 8 inclusive are cached
  • For bool, both values (true and false)
  • For char, byte, sbyte, short, ushort, uint, long, ulong, IntPtr and UIntPtr tasks with value 0 (or ”) are cached
  • For reference types, null is cached
  • For other types, no tasks are cached

EDIT: After they’ve completed, tasks are normally immutable except for disposal – the cached tasks are tweaked slightly to make disposal a no-op.

State machine interface changes

In the VS11 preview release, each state machine implemented an interface, but that interface was internal to the generated assembly, and contained a single method (SetMoveNextDelegate). It’s now a public interface with two methods:

Personally I’m not keen on the naming of "MoveNext" – I can’t help but feel that if we didn’t have the "naming baggage" of IEnumerator and the fact that at least early on, the code generator was very similar to that used for iterator blocks, we’d have something different. (It is moving to the next state of the state machine, but it still doesn’t quite feel right.) I’d favour something like "ContinueExecution". However, it doesn’t matter – it obviously does what you’d expect, and you’re not going to be calling this yourself.

SetStateMachine is a stranger beast. The documentation states:

Configures the state machine with a heap-allocated replica.

… which says almost nothing, really. The implementation is always simple, just like SetMoveNextDelegate was, although this time it delegates to the builder for the real work (a common theme, as we’ll see):

void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine param0)
{
    this.<>t__builder.SetStateMachine(param0);
}

Now AsyncTaskMethodBuilder.SetStateMachine is also documented pretty sparsely:

Associates the builder with the specified state machine.

Again, no real help. However, we’ll see that it’s the builder which is responsible for calling OnContinue now, and as it can call MoveNext on an IStateMachine, it makes sense to tell it which state machine it’s associated with… but can’t it do that directly?

Well, not quite. The problem (as I understand it) is around when boxing occurs. We initially create the state machine on the stack, and it contains the builder. (Both are structs.) That’s fine until we need a continuation, but then we’ve got to be able to get back to the current state later, after the current stack frame has been blown away. So we need to box the state machine. That will create a copy of the current builder (within the box). We need the builder within the boxed state machine to contain a reference to the same box. So the order has to be:

  • Box the state machine
  • Tell the state machine about the boxed reference
  • The state machine tells its nested builder about the boxed reference

Back when the state machine was in charge of the boxing, this went via the delegate: the act of creating the box was implicit when creating the delegate, and then casting the delegate target to the interface type allowed a reference to the newly-created delegate to be set within the copy. This is similar, but using the builder instead. It’s hard to follow, but of course it’s not going to matter.

State machine field changes

There are various kinds of fields in the state machine:

  • Those corresponding with local variables and parameters in the async method
  • The state
  • The field(s) associated with awaiters
  • (In the preview/beta) The field associated with the "current execution stack" at the point of an await expression
  • (In the CTP) An unused "disposing" field

Of these, I believe only the awaiters have actually changed, but before we talk about that, let’s revisit local variables.

Local variable hoisting

I’ve just noticed that the local variables are only hoisted to fields when its scope contains an await expressions, but in that case all local variables of that scope are hoisted, whether or not they’re used "across" awaits. It would be possible to hoist only those which need to be maintained between executions, but then you wouldn’t be able to see the others when debugging, which would be somewhat confusing. Likewise local variables of the same type which are never propagated across the same await could be aliased. For example, consider this async method:

static async Task<int> M(Random rng)
{
    int x = rng.Next(1000);
    int y = x + rng.Next(1000);
        
    await Task.Yield();
        
    int z = y + rng.Next(1000);
        
    await Task.Yield();
    return z;
}

If the compiler could be confident you didn’t need to debug through this code, it could make do with one field of type "Random" and one field of type "int" – x can be a completely local variable in MoveNext (it’s not used between two awaits) and y and z can be aliased (we never need the value of y after we’ve first written to z).

Local variable aliasing probably isn’t particularly useful for "normal" methods as the JIT may be able to do it (so long as you don’t have a debugger attached, potentially) but in this case we expect the state machine to be boxed at some point, so potentially does make a difference (while the stack is typically reasonably small, you could have a lot of outstanding async methods in some scenarios). Maybe in a future release, the C# compiler could have an aggressive optimization mode around this, to be turned on explicitly. (I don’t think it should be  a particularly high priority, mind you.)

Awaiter fields

(Code is in project 31, AwaiterFields.)

Awaiter fields have changed a bit over the course of async’s history.

In the CTPs (all of them, I believe) each await expression had its own awaiter field in the state machine, with a type corresponding to the declared awaiter type from the awaitable. (Remember that the awaitable is the thing you await, such as a task, and the awaiter is what you get back from calling GetAwaiter on the awaitable).

In the VS11 Preview, there was always a single awaiter field of type object. From what I saw, it was usually populated with a single-element array containing an awaiter. For value type awaiters (i.e. where the awaiter is a struct) this is somewhat similar to boxing, but maintaining strong typing, so calls to IsCompleted etc can still be made. It’s possible that reference type awaiters were stored without the array creation, as it would serve no purpose. (I don’t have any machines with just the preview installed to verify this.)

In the Beta, we have a mixture. If there are any reference type awaiters, they all end up being stored in a single field of type object, which is then cast back to the actual type when required. (Don’t forget that only one awaiter can be "active" at a time, which makes this possible.) This includes awaiters of an interface type – it’s only the compile-time type declared as the return type of the GetAwaiter method of the awaitable which is important.

If any of the awaiter types are value types, each of these types gets its own field. So there might be a TaskAwaiter<int> field and a TaskAwaiter<string> field, for example. However, there can still be "sharing" going on: if there are multiple await expressions all of the same value type awaiter, they will all share a single field. (This all feels a little like the JITting of generics, but it’s somewhat coincidental.)

MoveNext method changes

(Code is in project 32, BetaStateMachine)

As I’ve mentioned earlier, the builder is now responsible for a lot more of the work than it was in earlier versions. The majority of the code remains the same as far as I can tell, in terms of handling branching, evaluating expressions with multiple await expressions and so on.  The code in the source repository shows what the complete state machine looks like, but for the sake of clarity, I’ll just focus on a single snippet. If we have an await expression like this:

await x;

then the state machine code in the VS11 Preview would look something like this:

localTaskAwaiter = x.GetAwaiter();
if (localTaskAwaiter.IsCompleted)
{
    goto AwaitCompleted;
}
this.state = 1;
TaskAwaiter[] awaiterArray = { localTaskAwaiter };
this.awaiter = awaiterArray;
Action continuation = this.MoveNextDelegate;
if (continuation == null)
{
    Task<int> task = this.builder.Task;
    continuation = MoveNext;
    ((IStateMachine) continuation.Target).SetMoveNextDelegate(continuation);
}
awaiterArray[0].OnCompleted(continuation);

return;

(That’s just setting up the await, of course – there’s then the bit where the result is fetched, but that’s less interesting. There’s also the matter of doFinallyBodies.)

In the VS11 Beta, it’s something this instead (for an awaiter type "AwaiterType" which implements ICriticalNotifyCompletion, in a state machine of type ThisStateMachine).

localAwaiter = x.GetAwaiter();
if (!localAwaiter.IsCompleted)
{
    state = 0;
    awaiterField = localAwaiter;
    builder.AwaitUnsafeOnCompleted<AwaiterType, ThisStateMachine>(ref localAwaiter, ref this);
    doFinallyBodies = false;
    return;
}

If the awaiter type only implements INotifyCompletion, it calls AwaitOnCompleted instead. Note how the calls are generic (but both type variables are constrained to implement appropriate interfaces) which avoids boxing.

The call to the builder will call back to the state machine’s SetStateMachine method if this is the first awaiter that hasn’t already completed within this execution of the async method. So that handles the section which checked for the continuation being null in the first block of code. Most of the rest of the change is explained by the difference in awaiter types, and obviously AwaitOnCompleted/AwaitUnsafeOnCompleted also ends up calling into OnCompleted on the awaiter itself.

Mutable value type awaiters

(Code is in project 32, MutableAwaiters)

One subtle difference which really shouldn’t hurt people but is fun to explore is what happens if you have an awaiter which is a mutable value type. Due to the way awaiters were carefully handled pre-beta, mutations which were conducted as part of OnCompleted would still be visible in GetResult. That’s not the case in the beta (as Stephen mentions in his blog post). Mind you, it doesn’t mean that all mutations will be ignored… just ones in OnCompleted. A mutation from IsCompleted is still visible, as shown here:

public struct MutableAwaiter : INotifyCompletion
{
     private string message;

     public MutableAwaiter(string message)
     {
         this.message = message;
     }

     public bool IsCompleted
     {
         get
         { 
             message = "Set in IsCompleted";
             return false;
         }
     }

     public void OnCompleted(Action action)
     {
         message = "Set in OnCompleted";
         // Ick! Completes inline. Never mind, it’s only a demo…
         action();
     }

     public string GetResult()
     {
         return message;
     }
}

What would you expect to be returned from this awaiter? You can verify that all three members are called… but "Set in IsCompleted" is returned. That’s because IsCompleted is called before the awaiter value is copied into a field within the state machine. Even though the state machine passes the awaiter by reference, it’s passing the local variable, which is of course a separate variable from the field.

I’m absolutely not suggesting that you should rely on any of this behaviour. If you really need to be able to mutate your awaiter, make it a reference type.

Conclusion

The main changes in the Beta are around the interactions between the AsyncTaskMethodBuilder (et al) and the state machine, including the new interfaces for awaiters. There’s been quite a bit of optimization, although I still see room for a bit more:

  • When there’s only a single kind of reference type awaiter, the field for storing it could be of that type rather than of type object, removing the need for an execution-time cast
  • The "stack" variable could be removed in some cases, and made into a specific type in many others
  • With appropriate optimization flags, local variables which aren’t used await expressions could stay local to the state machine instead of being hoisted, and hoisted variables could be aliased in some cases.

One thing which concerns me slightly is how the C# language specification is going to change – the addition of the new interfaces is definitely going to mean more complexity from this previously "tidy" feature. I’m sure it’s worth it for the sake of efficiency and the like, but part of me sighs at every added tweak.

So, is this now close to the finished version of async? Only time will tell. I haven’t checked whether dynamic awaitables have finally been introduced… if they have, I’ll put that in the next post.

Subtleties in API design – member placement

Noda Time is nearing v1.0, which means I’m spending more time writing documentation than code. It also means reviewing the APIs we’ve got with a critical eye – whether that’s removing extraneous members, adding useful ones, or moving things around. (In particular, writing documentation often suggests where a change would make calling code read more naturally.)

This post is about one particular section of the API, and the choices available. Although I do go into some detail around the specific calls involved, that’s just for context… the underlying choices are ones which could be faced when designing any API. I’ve rarely spent as much time thinking about API decisions as I have with Noda Time, so hopefully this will prove interesting to you even if you really don’t care about Noda Time itself as a project.

Introduction: time zones, local date/times and zoned date/times – oh my!

(Okay, so that’s not quite as snappy as the Judy Garland version, but hey…)

The area of API we’re going to focus on is time zones, and converting between "local" date/time values and "zoned" ones. The three types involved are:

  • LocalDateTime: a "local" date and time, with no specific time zone. So, something like "7:30 in the evening on February 27th 2012". This means different instants in time in different time zones, of course: if you’re arranging a meeting, it’s good enough when the attendees are in the same time zone, but not good enough if you’re meeting with someone on the other side of the world. (A LocalDateTime also has an associated calendar system, but I’m not going to talk about that much in this post.)
  • DateTimeZone: a time zone. At its core, this maps any "instant" in time to an offset – the difference between UTC and local time in that time zone. The offset changes over time, typically (but not always) due to daylight saving changes.
  • ZonedDateTime: a date and time in a particular time zone, with an offset from UTC to avoid ambiguity in some cases (and for efficiency). This identifies a specific instant in time (simply by subtracting the offset from the local date/time). Conceptually this is equivalent to just maintaining the "instant" value, the time zone, and the calendar system – but it’s generally cleaner to think of it as a "local" value with an offset from UTC.

If those brief descriptions don’t make sense for you at the moment (this sort of thing is quite hard to describe concisely and precisely) you may want to see whether the Noda Time user guide "concepts" page helps.

The API task: mapping from { LocalDateTime, DateTimeZone } to ZonedDateTime

It’s easy to get from a ZonedDateTime to a LocalDateTime – you can just use the LocalDateTime property. The difficult bit is the other way round. We obviously want to be able to create a ZonedDateTime from the combination of a LocalDateTime and a DateTimeZone, but the question is where to put this functionality. Three options suggest themselves:

  • A static method (or constructor) in ZonedDateTime which takes both the time zone and the local date/time as arguments
  • An instance method on LocalDateTime which just takes the time zone as an argument
  • An instance method on DateTimeZone which just takes the local date/time as an argument

It gets more complicated though – we’re not really talking about one operation here, but potentially several. Although the mapping from instant to offset is unambiguous in DateTimeZone, the mapping from LocalDateTime to offset is not straightforward. There can be 0, 1 or 2 possible results. For example, in the America/Los_Angeles time zone the clocks go forward from 2am to 3am on Sunday March 11th 2012, and go back from 2am to 1am on Sunday 4th November 2012. That means:

  • The mapping from local date/time to offset at 7.30pm on February 27th 2012 is unambiguous: it’s definitely -8 hours (L.A. is 8 hours behind UTC).
  • The mapping at 2.30am on March 11th 2012 is impossible: at 2am the clocks were put forward to 3am, so 2.30am simply never occurs.
  • The mapping at 2.30am on November 4th 2012 is ambiguous: it happens once before the clocks go back at 3am, and once afterwards. The offset is either -7 or -8 hours, depending on which occurrence you mean.

When mapping a local time to "global" time, this is something you should really think about. Most APIs obscure the issue, but one of the purposes of Noda Time is to force developers to think about issues which they should be aware of. This one is particularly insidious in that it’s the kind of problem which is much more likely to arise when you’re asleep than during daylight hours – so it’s unlikely to be found during manual testing. (Ditto the day of week – most time zones have daylight saving transitions early on a Sunday morning.)

So, Noda Time exposes four ways of mapping a LocalDateTime and DateTimeZone to a ZonedDateTime:

  • Exact: if there’s a single mapping, return it. Otherwise, throw an exception.
  • Earlier: if there’s a single mapping, return it. If there’s more than one, return the earlier one. If the time is skipped, throw an exception.
  • Later: if there’s a single mapping, return it. If there’s more than one, return the later one. If the time is skipped, throw an exception.
  • All information: find out all the information relevant to mapping the given local value – how many matches there are, what they would be, what the time zone information is for each mapping, etc. The caller can then do what they want.

Options available

The question is how we expose these operations. Let’s look at some options, then discuss the pros and cons.

Option 1: methods on LocalDateTime

A lot of Noda Time is designed to be "fluent" so it makes a certain amount of sense to be able to take a LocalDateTime, perform some arithmetic on it, then convert it to a ZonedDateTime, then (say) format it. So we could have something like:

  • var zoned = local.InZone(zone); // implicitly exact
  • var zoned = local.InZoneOrEarlier(zone);
  • var zoned = local.InZoneOrLater(zone);
  • var mapping = local.MapToZone(zone);

Option 2: methods on DateTimeZone

All the calculations involved are really about the time zone – the local date/time value is just a simple value as far as most of this is concerned. So we can put the methods on DateTimeZone instead:

  • var zoned = zone.AtExactly(local);
  • var zoned = zone.AtEarlier(local);
  • var zoned = zone.AtLater(local);
  • var mapping = zone.MapLocalDateTime(local);

Option 3: methods (or constructors) on ZonedDateTime

Maybe we consider the two inputs to be fairly equal, but the result type is more important:

  • var zoned = ZonedDateTime.FromLocal(zone, local);
  • var zoned = ZonedDateTime.FromLocalOrEarlier(zone, local);
  • var zoned = ZonedDateTime.FromLocalOrLater(zone, local);
  • var mapping = ZoneLocalMapping.FromLocal(local)

(I’m not terribly happy about the names here; there could be better ones of course.)

Variation a: any of the above options, but with an enum for ambiguity resolution

We don’t really need four methods on any of these APIs; the first three only differ by how they handle ambiguity (the situation where a particular local date/time occurs twice). We could use an enum to represent that choice instead:

  • var zoned = local.InZone(zone, ZoneAmbiguityResolver.Error);
  • var zoned = local.InZone(zone, ZoneAmbiguityResolver.Earlier);
  • var zoned = local.InZone(zone, ZoneAmbiguityResolver.Later);

(Or a "smart enum" with behaviour, if we wanted. A normal class type with methods etc, but only a restricted set of valid values.)

Variation b: always go via the mapping result

Given that we already have the idea of getting the full mapping results, we can reduce the API to just one method to return the mapping information, and then put extra methods on that:

  • var zoned = local.MapInZone(zone).SingleMatch;
  • var zoned = local.MapInZone(zone).SingleOrEarlier;
  • var zoned = local.MapInZone(zone).SingleOrLater;

(Again, the names aren’t fixed in stone, and the second part could be methods instead of properties if we wanted.)

Variation c: return a sequence of results

If we return a sequence with 0, 1 or 2 ZonedDateTime values, the user can just use LINQ to get the one they want. Again, this can apply wherever we decide to put the method:

  • var zoned = zone.At(local).Single();
  • var zoned = zone.At(local).First();
  • var zoned = zone.At(local).Last();

So, it looks like we effectively have two mostly-orthogonal decisions here:

  • Where to "start" the conversion – the target type for the method call
  • How to represent the multiple options

We’ll consider them separately.

Regarding the "source" type

To start with, I’ll reveal my bias: the existing implementation is option 2 (four methods on DateTimeZone). This was after a small amount of debate on the Noda Time mailing list, and this was the most relevant bit of the discussion:

Me (before going with the current implementation):

It feels a little odd to me to use the zone as the principal class here – just in terms of usability. It makes total sense in terms of the logic, but I tend to think of having a LocalDateTime first, and then converting that to use a particular zone – it’s not an operation which feels like it acts on the zone itself.

David N:

I actually feel the opposite: asking a DateTimeZone how a particular LocalDateTime would be represented in that zone feels natural, while asking the LocalDateTime how it would be represented in a zone feels odd. The zone is a first-class entity, with identity and behavior; the LocalDateTime is just a set of values. Why would the LocalDateTime be expected to know how it is represented in a particular zone?

Even though I replied to that in a "maybe" kind of tone, the argument basically convinced me. The trouble is, a colleague was then surprised when he read the documentation around calendar arithmetic and conversions. Surprising users is pretty much a cardinal sin when it comes to API design – and although in this case it was the relatively harmless sort of surprise ("I can’t find the member I want in A; oh, it turns out it’s in B") rather than a behavioural surprise ("I thought it would do X, but instead it does Y") it’s still bad news. I should reveal my colleague’s bias too – he has experience of Joda Time, where the relevant call is LocalDateTime.toDateTime(DateTimeZone). (There are calls in DateTimeZone itself, but they’re lower-level.)

We’ve discussed this a fair amount (leading to this blog post) and so far we’ve concluded that it really depends on how you think about time zones. As a Noda Time user, would you consider them to be rich objects with complex behaviour, or would you think of them as mere "tokens" to be passed around and used without much direct interaction? The two ways of viewing the type aren’t necessarily in conflict – I’ve deliberately designed CalendarSystem to hide its (fairly ugly) innards. There are a few public instance members, but most are internal. But what about time zones?

There’s an argument to be made for educating Noda Time users to think about time zones as more complex beasts than just tokens, and I’m happy to do that in other areas (such as choosing which type to use in the first place) but here it feels like it’s one step too far. On the other hand, I don’t want to stifle users who are thinking of DateTimeZone in that way. In the mailing list thread, David also expressed a dislike for the approach of including functionality in multiple places – and to a certain extent I agree (one of the things I dislike about its API is that it lets you do just about anything with anything)… but in this case it feels like it’s justified.

Regardless of how you’re thinking about DateTimeZone, it’s more likely that you’re going to want to use a LocalDateTime value which is the result of some other expression, and then apply some "constant" zone to it, then potentially keep going. If you think about a LINQ-style pipeline of operations, the part that varies in the conversion is much more likely to be the LocalDateTime than the time zone. As such, a method on LocalDateTime allows for a more fluent calling style:

var zoned = parseResult.Value
                       .PlusMonths(1)
                       .InZone(LondonTimeZone);

versus:

var zoned = LondonTimeZone.AtExactly(parseResult.Value.PlusMonths(1));

Or to keep the code order the same as the execution order:

var local = parseResult.Value.PlusMonths(1);
var zoned = LondonTimeZone.AtExactly(local);

Obviously the effects become more noticeable the more operations you perform. Personally I’m happy with either the first or third approach – although it’s worth being aware that either of the first two have the advantage of being one expression, and therefore easy to use when assigning a static readonly field or something similar.

I’m reasonably happy with having one method on each type, or possibly two (MapLocalDateTime and At*) on DateTimeZone and one (just InZone) on LocalDateTime. I really don’t like the idea of having four methods on DateTimeZone and three methods on LocalDateTime. So, let’s consider the different variations which cut down the number of methods required.

 

Expressing "exactly," "earlier," and "later" in one method

This is essentially a discussion of the "variations" above. Just to recap, the possibilities we’ve come up with are:

  • Add another parameter to the method to indicate how to handle ambiguities (or impossibilities) – just return a ZonedDateTime
  • Return a value of a different type (e.g. ZoneLocalMapping) which can be used to get at all the information you could want
  • Return a sequence of possible ZonedDateTime values, expecting the caller to probably use LINQ’s First/Last/Single/FirstOrDefault etc to get at the value they want

The last of these is the only one which gives an easy way of handling the extreme corner case of a local time occurring more than twice – for example, a time zone which goes back one hour at 2am (to 1am) and then goes back another two hours at 3am. I think it’s reasonable to dismiss this corner case; however mad time zones can be, I haven’t seen anything quite that crazy yet.

At the time of the original discussion, Noda Time was targeting .NET 2.0, which was one reason for not going with the final option here – we couldn’t guarantee that LINQ would be available. Now, Noda Time is targeting .NET 3.5 in order to use TimeZoneInfo, but it still doesn’t feel like an ideal fit:

  • Returning a sequence doesn’t give information about (say) the two zone intervals "surrounding" a gap
  • A sequence may be surprising to users who expect just a single value
  • The exceptions thrown by First, Single etc when their expectations aren’t met are very domain-neutral; we can give more information
  • FirstOrDefault will return the default value for ZonedDateTime in the case of ambiguity. That would be unfortunate, as ZonedDateTime is a value type, and its default value is actually somewhat unhelpful. (It has a null calendar system, effectively. There’s not a lot we can do about this, but that’s a post for another day.) We could make it a sequence of Nullable<ZonedDateTime> and guarantee that any values in it are actually non-null, but that’s really straining things.

Putting these together, there are enough negative points to this idea that I’m happy to rule it out. But what about the first two?

The first has the advantage that the caller only needs to make a single method call, effectively passing in a "magic token" (the ambiguity resolver) which they don’t really need to understand. On the other hand, if they want more information, they’ll have to call a different method – and I’m not really sure we want to encourage too much of this "magic token" behaviour.

The second has three disadvantages, all fairly slight:

  • The user may initially expect the result of a method mapping a LocalDateTime to a ZonedDateTime to be a ZonedDateTime… what’s this extra intermediate result doing? This is "only" a matter of user education, and it’s pretty discoverable. It’s an extra concept the user needs to understand, but it’s a good concept to understand.
  • Calling two methods or a method and a property (e.g. zone.MapLocalDateTime(localDateTime).Earlier) may end up being slightly more long-winded than a single method call. I can’t get excited about this.
  • We have to allocate an extra object for the mapping, even when we know it’s unique. Usually, this object will become eligible for garbage collection immediately. We could make it a struct, but I don’t think it’s a natural value type – I’d rather trust that allocating objects in gen0 is pretty cheap.

With the second method, we can replace all the existing methods in DateTimeZone with a single one (or rather, just remove the AtXyz methods, leaving MapLocalDateTime). We can then create pleasantly-named methods on ZoneLocalMapping (which isn’t quite right for this purpose at the moment).

Conclusion

This has been an interesting thought experiment for me, and it’s suggested some changes I will be applying before v1. We’ll see how they pan out. If you want to follow them, look for relevant source code changes.

The important points I’ve been thinking about are:

  • What would a new user expect to be available? If they haven’t read any documentation, what are they likely to try?
  • What should the user know about? Where there are important decisions to make, how can we provide guidance?
  • What would an experienced user (who is already thinking about the Noda Time concepts in the way that we want to encourage) expect to be available?
  • Where does the balance lie between providing a "too crowded" API (with lots of different ways of achieving the same thing) and a "sparse" API (where there’s always one way of achieving a goal, but it may not be the most convenient one for your situation)
  • How does our choice fit in with other technologies? For example, the final "variation" seems like it plays nicely with LINQ at first – but a few subtleties make it a worse fit than it might seem.
  • How does this affect performance? (Performance is important in Noda Time – but there would have to be a significant performance problem for me to deviate from an elegant solution.)

So, any other thoughts? Did we miss some options? What other factors should we have taken into consideration?

Currying vs partial function application

This is a slightly odd post, and before you read it you should probably put yourself into one of three buckets:

  • Someone who doesn’t care too much about functional programming, and finds higher order functions tricky: feel free to skip this post entirely.
  • Someone who knows all about functional programming, and already knows the difference between currying and partial function application: please read this post carefully and post comments about any inaccuracies you find. (Yes, the CAPTCHA is broken on Chrome; sorry.)
  • Someone who doesn’t know much about functional programming yet, but is interested to learn more: please take this post with a pinch of salt, and read the comments carefully. Read other articles by more experienced developers for more information.

Basically, I’ve been aware for a while that some people use the terms currying and partial function application somewhat interchangably, when they shouldn’t. It’s one of those topics (like monads) which I feel I understand to some extent, and I’ve decided that the best way of making sure I understand it is to try to write about it. If it helps the topic become more accessible to other developers, so much the better.

This post contains no Haskell

Almost every explanation I’ve ever seen of either topic has given examples in a "proper" functional language, typically Haskell. I have absolutely nothing against Haskell, but I typically find it easier to understand examples in a programming language I understand. I also find it much easier to write examples in a program language I understand, so all the examples in this post are going to be in C#. In fact, it’s all available in a single file – that includes all of the examples, admittedly with a few variables renamed. Just compile and run.

C# isn’t really a functional language – I know just about enough to understand that delegates aren’t really a proper substitute for first class functions. However, they’re good enough to demonstrate the principles involved.

While it’s possible to demonstrate currying and partial function application using a function (method) taking a very small number of parameters, I’ve chosen to use 3 for clarity. Although my methods to perform the currying and partial function application will be generic (so all the types of parameters and return value are arbitrary) I’m using a simple function for demonstration purposes:

static string SampleFunction(int a, int b, int c) 

    return string.Format("a={0}; b={1}; c={2}", a, b, c); 
}

So far, so simple. There’s nothing tricky about that method, so don’t look for anything surprising.

What’s it all about?

Both currying and partial function application are about converting one sort of function to another. We’ll use delegates as an approximation to functions, so if we want to treat the method SampleFunction as a value, we can write:

Func<int, int, int, string> function = SampleFunction;

This single line is useful for two reasons:

  • Assigning the value to a variable hammers home the point that it really is a value. A delegate instance is an object much like any other, and the value of the function variable is a reference just like any other.
  • Method group conversions (using just the name of the method as a way of creating a delegate) doesn’t work terribly nicely with type inference when calling a generic method.

We can already call the function using three arguments with no further work:

string result = function(1, 2, 3);

Or equivalently:

string result = function.Invoke(1, 2, 3);

(The C# compiler just converts the shorthand of the first form to the second. The IL emitted will be the same.)

That’s fine if we’ve got all the arguments available at the same time, but what if we haven’t? To give a concrete (if somewhat contrived) example, suppose we have a logging function with three parameters (source, severity, message) and within a single class (which I’ll call BusinessLogic for the moment) we always want to use the same value for the "source" parameter, and we’d like to be able to log easily everywhere in the class specifying just the severity and message. We have a few options:

  • Create an adapter class which takes the log function (or more generally a logging object) and the "source" value in its constructor, stashes both in instance variables, and exposes a method with two parameters. That method just delegates to the stashed logger, using the source it’s remembered to supply the first argument to the three-parameter method. In BusinessLogic we create an instance of the adapter class, and stash a reference in an instance variable – then just call the two-parameter method everywhere we need to. This is probably overkill if we only need the adapter from BusinessLogic, but it’s reusable… so long as we’re trying to adapt the same logging function.
  • Store the original logger in our BusinessLogic class, but create a helper method with two parameters. This can hard-code the value used for the "source" parameter in one place (the helper method). If we need to do this in several places, it gets annoying.
  • Use a more general functional programming approach – probably partial function application in this case.

I’m deliberately ignoring the discrepancy between holding a reference to "the logger" and holding a reference to "the logging function". Obviously there’s a significant difference if we need to use more than one function from the logging class, but for the purposes of thinking about currying and partial function application, we’ll just think of "a logger" as "a function taking three parameters" (like our sample function).

Now that I’ve given the slightly-real-world concrete example for a bit of motivation, I’m going to ignore it for the rest of the post, and just talk about our sample function. I don’t want to write a whole BusinessLogic class which pretends to do something useful; I’m sure you can perform the appropriate mental conversion from "the sample function" to "something I might actually want to do".

Partial Function Application

The purpose of partial function application is to take a function with N parameters and a value for one of those parameters, and return a function with N-1 parameters, such that calling the result will assemble all the required values appropriately (the 1 argument given to the partial application operation itself, and the N-1 arguments given to the returned function). So in code form, these two calls should be equivalent for our 3-parameter method:

// Normal call
string result1 = function(1, 2, 3);

// Call via partial application
Func<int, int, string> partialFunction = ApplyPartial(function, 1); 
string result2 = partialFunction(2, 3);

In this case I’ve implemented partial application with a single parameter, and chosen the first one – you could write an ApplyPartial method which took more arguments to apply, or which used them somewhere else in the final function execution. I believe that picking off parameters one at a time, from the start, is the most conventional approach.

Thanks to anonymous functions (a lambda expression in this case, but an anonymous method wouldn’t be much more verbose), the implementation of ApplyPartial is simple:

static Func<T2, T3, TResult> ApplyPartial<T1, T2, T3, TResult>
    (Func<T1, T2, T3, TResult> function, T1 arg1) 

    return (b, c) => function(arg1, b, c); 
}

The generics make the method look more complicated than it really is. Note that the lack of higher order types in C# means that you’d need a method like this for every delegate you wanted to use – if you wanted a version for a function which started with four parameters, you’d need an ApplyPartial<T1, T2, T3, T4, TResult> method etc. You’d probably also want a parallel set of methods for the Action delegate family.

The final thing to note is that assuming we had all of these methods, we could perform partial function application again – even potentially down to a parameterless function if we wanted, like this:

Func<int, int, string> partial1 = ApplyPartial(function, 1); 
Func<int, string> partial2 = ApplyPartial(partial1, 2); 
Func<string> partial3 = ApplyPartial(partial2, 3); 
string result = partial3();

Again, only the final line would actually invoke the original function.

Okay, so that’s partial function application. That’s relatively straightforward. Currying is slightly harder to get your head round, in my view.

Currying

Whereas partial function application converts a function with N parameters into a function with N-1 parameters by applying one argument, currying effectively decomposes the function into functions taking a single parameter. We don’t pass any arguments into the Curry method itself:

  • Curry(f) returns a function f1 such that…
  • f1(a) returns a function f2 such that…
  • f2(b) returns a function f3 such that…
  • f3(c) invokes f(a, b, c)

(Again, note that this is specific to our three-parameter function – but hopefully it’s obvious how it would extend to other signatures.)

To give our "equivalence" example again, we can write:

// Normal call
string result1 = function(1, 2, 3);

// Call via currying
Func<int, Func<int, Func<int, string>>> f1 = Curry(function); 
Func<int, Func<int, string>> f2 = f1(1); 
Func<int, string> f3 = f2(2); 
string result2 = f3(3);

// Or to do make all the calls together…
var curried = Curry(function); 
string result3 = curried(1)(2)(3);

The difference between the latter examples shows one reason why functional languages often have good type inference and compact representations of function types: that declaration of f1 is pretty fearsome.

Now that we know what the Curry method is meant to do, it’s actually surprisingly simple to implement. Indeed, all we need to do is translate the bullet points above into lambda expressions. It’s a thing of beauty:

static Func<T1, Func<T2, Func<T3, TResult>>> Curry<T1, T2, T3, TResult> 
    (Func<T1, T2, T3, TResult> function) 

    return a => b => c => function(a, b, c); 
}

If you want to add some brackets to make it clearer for you, feel free – personally I think it just adds clutter. Either way, we get what we want. (It’s worth thinking about how annoying it would be to write that without lambda expressions or anonymous methods. Not hard, just annoying.)

So that’s currying. I think. Maybe.

Conclusion

I can’t say I’ve ever knowingly used currying, whereas I suspect some bits of Noda Time‘s text parsing effectively use partial functional application. (If anyone really wants me to dig in and check, I’ll do so.)

I really hope I’ve got the difference between them right here – it feels right, in that the two are clearly related, but also quite distinct. Now that we’ve reached the end, think about how that difference manifests itself when there are only two parameters, and hopefully you’ll see why I chose to use three :)

My gut feeling is that currying is a more useful concept in an academic context, whereas partial functional application is more useful in practice. However, that’s the gut feeling of someone who hasn’t really used a functional language in anger. If I ever really get round to using F#, maybe I’ll do a follow-up post. For now, I’m hoping that my trusty smart readers can provide useful thoughts for others.

Eduasync part 19: ordering by completion, ahead of time…

Today’s post involves the MagicOrdering project in source control (project 28).

When I wrote part 16 of Eduasync, showing composition in the form of majority voting, one reader mailed me a really interesting suggestion. We don’t really need to wait for any of the tasks to complete on each iteration of the loop – we only need to wait for the next task to complete. Now that sounds impossible – sure, it’s great if we know the completion order of the tasks, but half the point of asynchrony is that many things can be happening at once, and we don’t know when they’ll complete. However, it’s not as silly as it sounds.

If you give me a collection of tasks, I’ll give you back another collection of tasks which will return the same results – but I’ll order them so that the first returned task will have the same result as whichever of your original tasks completes first, and the second returned task will have the same result as whichever of your original tasks completes second, and so on. They won’t be the same tasks as you gave me, reordered – but they’ll be tasks with the same results. I’ll propagate cancellation, exceptions and so on.

It still sounds impossible… until you realize that I don’t have to associate one of my returned tasks with one of your original tasks until it has completed. Before anything has completed, all the tasks look the same. The trick is that as soon as I see one of your tasks complete, I can fetch the result and propagate it to the first of the tasks I’ve returned to you, using TaskCompletionSource<T>. When the second of your tasks completes, I propagate the result to the second of the returned tasks, etc. This is all quite easy using Task<T>.ContinueWith – barring a few caveats I’ll mention later on.

Once we’ve built a method to do this, we can then really easily build a method which is the async equivalent of Parallel.ForEach (and indeed you could write multiple methods for the various overloads). This will execute a specific action on each task in turn, as it completes… it’s like repeatedly calling Task.WhenAny, but we only actually need to wait for one task at a time, because we know that the first task in our "completion ordered" collection will be the first one to complete (duh).

Show me the code!

Enough description – let’s look at how we’ll demonstrate both methods, and then how we implement them.

private static async Task PrintDelayedRandomTasksAsync()
{
    Random rng = new Random();
    var values = Enumerable.Range(0, 10).Select(_ => rng.Next(3000)).ToList();
    Console.WriteLine("Initial order: {0}", string.Join(" ", values));

    var tasks = values.Select(DelayAsync);

    var ordered = OrderByCompletion(tasks);

    Console.WriteLine("In order of completion:");
    await ForEach(ordered, Console.WriteLine);
}

/// <summary>
/// Returns a task which delays (asynchronously) by the given number of milliseconds,
/// then return that same number back.
/// </summary>
private static async Task<int> DelayAsync(int delayMillis)
{
    await TaskEx.Delay(delayMillis);
    return delayMillis;
}

The idea is that we’re going to create 10 tasks which each just wait for some random period of time, and return the same time period back. We’ll create them in any old order – but obviously they should complete in (at least roughly) the same order as the returned numbers.

Once we’ve created the collection of tasks, we’ll call OrderByCompletion to create a second collection of tasks, returning the same results but this time in completion order – so ordered.ElementAt(0) will be the first task to complete, for example.

Finally, we call ForEach and pass in the ordered task collection, along with Console.WriteLine as the action to take with each value. We await the resulting Task to mimic blocking until the foreach loop has finished. Note that we could make this a non-async method and just return the task returned by ForEach, given that that’s our only await expression and it’s right at the end of the method. This would be marginally faster, too – there’s no need to build an extra state machine. See Stephen Toub’s article about async performance for more information.

ForEach

I’d like to get ForEach out of the way first, as it’s so simple: it’s literally just iterating over the tasks, awaiting them and propagating the result to the action. We get the "return a task which will wait until we’ve finished" for free by virtue of making it an async method.

/// <summary>
/// Executes the given action on each of the tasks in turn, in the order of
/// the sequence. The action is passed the result of each task.
/// </summary>
private static async Task ForEach<T>(IEnumerable<Task<T>> tasks, Action<T> action)
{
    foreach (var task in tasks)
    {
        T value = await task;
        action(value);
    }
}

Simple, right? Let’s get onto the meat…

OrderByCompletion

This is the tricky bit, and I’ve actually split it into two methods to make it slightly easier to comprehend. The PropagateResult method feels like it could be useful in other composition methods, too.

The basic plan is:

  • Copy the input tasks to a list: we need to work out how many there are and iterate over them, so let’s make sure we only iterate once
  • Create a collection of TaskCompletionSource<T> references, one for each input task. Note that we’re not associating any particular input task with any particular completion source – we just need the same number of them
  • Declare an integer to keep track of "the next available completion source"
  • Attach a continuation to each input task which will be increment the counter we’ve just declared, and propagate the just-completed task’s status
  • Return a view onto the collection of TaskCompletionSource<T> values, projecting each one to its Task property

Once you’re happy with the idea, the implementation isn’t too surprising (although it is quite long):

/// <summary>
/// Returns a sequence of tasks which will be observed to complete with the same set
/// of results as the given input tasks, but in the order in which the original tasks complete.
/// </summary>
private static IEnumerable<Task<T>> OrderByCompletion<T>(IEnumerable<Task<T>> inputTasks)
{
    // Copy the input so we know it’ll be stable, and we don’t evaluate it twice
    var inputTaskList = inputTasks.ToList();

    // Could use Enumerable.Range here, if we wanted…
    var completionSourceList = new List<TaskCompletionSource<T>>(inputTaskList.Count);
    for (int i = 0; i < inputTaskList.Count; i++)
    {
        completionSourceList.Add(new TaskCompletionSource<T>());
    }

    // At any one time, this is "the index of the box we’ve just filled".
    // It would be nice to make it nextIndex and start with 0, but Interlocked.Increment
    // returns the incremented value…
    int prevIndex = -1;

    // We don’t have to create this outside the loop, but it makes it clearer
    // that the continuation is the same for all tasks.
    Action<Task<T>> continuation = completedTask =>
    {
        int index = Interlocked.Increment(ref prevIndex);
        var source = completionSourceList[index];
        PropagateResult(completedTask, source);
    };

    foreach (var inputTask in inputTaskList)
    {
        // TODO: Work out whether TaskScheduler.Default is really the right one to use.
        inputTask.ContinueWith(continuation,
                               CancellationToken.None,
                               TaskContinuationOptions.ExecuteSynchronously,
                               TaskScheduler.Default);
    }

    return completionSourceList.Select(source => source.Task);
}

/// <summary>
/// Propagates the status of the given task (which must be completed) to a task completion source
/// (which should not be).
/// </summary>
private static void PropagateResult<T>(Task<T> completedTask,
    TaskCompletionSource<T> completionSource)
{
    switch (completedTask.Status)
    {
        case TaskStatus.Canceled:
            completionSource.TrySetCanceled();
            break;
        case TaskStatus.Faulted:
            completionSource.TrySetException(completedTask.Exception.InnerExceptions);
            break;
        case TaskStatus.RanToCompletion:
            completionSource.TrySetResult(completedTask.Result);
            break;
        default:
            // TODO: Work out whether this is really appropriate. Could set
            // an exception in the completion source, of course…
            throw new ArgumentException("Task was not completed");
    }
}

You’ll notice there are a couple of TODO comments there. The exception in PropagateResult really shouldn’t happen – the continuation shouldn’t be called when the task hasn’t completed. I still need to think carefully about how tasks should propagate exceptions though.

The arguments to ContinueWith are more tricky: working through my TimeMachine class and some unit tests with Bill Wagner last week showed just how little I know about how SynchronizationContext, the task awaiters, task schedulers, and TaskContinuationOptions.ExecuteSynchronously all interact. I would definitely need to look into that more deeply before TimeMachine was really ready for heavy use… which means you should probably be looking at the TPL in more depth too.

Conclusion

Sure enough, when you run the code, the results appear in order, as the tasks complete. Here’s one sample of the output:

Initial order: 335 468 1842 1991 2512 2603 270 2854 1972 1327
In order of completion:
270
335
468
1327
1842
1972
1991
2512
2603
2854

TODOs aside, the code in this post is remarkable (which I can say with modesty, as I’ve only refactored it from the code sent to me by another reader and Stephen Toub). It makes me smile every time I think about the seemingly-impossible job it accomplishes. I suspect this approach could be useful in any number of composition blocks – it’s definitely one to remember.

Eduasync part 18: Changes between the Async CTP and the Visual Studio 11 Preview

In preparation for CodeMash, I’ve been writing some more async code and decompiling it with Reflector. This time I’m using the Visual Studio 11 Developer Preview – the version which installs alongside Visual Studio 2010 under Windows 7. (Don’t ask me about any other features of Visual Studio 11 – I haven’t explored it thoroughly; I’ve really only used it for the C# 5 bits.)

There have been quite a few changes since the CTP – they’re not visible changes in terms of code that you’d normally write, but the state machine generated by the C# compiler is reasonably different. In this post I’ll describe the differences, as best I understand them. There are still a couple of things I don’t understand (which I’ll highlight within the post) but overall, I think I’ve got a pretty good handle on why the changes have been made.

I’m going to assume you already have a reasonable grasp of the basic idea of async and how it works – the way that the compiler generates a state machine to represent an async method or anonymous function, with originally-local variables being promoted to instance variables within the state machine, etc. If the last sentence was a complete mystery to you, see Eduasync part 7 for more information. I don’t expect you to remember the exact details of what was in the previous CTP though :)

Removal of iterator block leftovers

In the CTP, the code for async methods was based on the iterator block implementation. I suspect that’s still the case, but possibly sharing just a little less code. There used to be a few methods and fields which weren’t used in async methods, but now they’re gone:

  • There’s no now constructor, so no need for the "skeleton" method which replaces the real async method to pass in 0 as the initial state.
  • There’s no Dispose method.
  • There’s no disposing field.

It’s nice to see these gone, but it’s not terribly interesting. Now on to the bigger changes…

Large structural changes

There’s a set of related structural changes which don’t make sense individually. I’ll describe them first, then look at how it all hangs together, and my guess as to the reasoning behind.

The state machine is now a struct

The declaration of the nested type for the state machine is now something like this:

[CompilerGenerated]
private struct StateMachine : IStateMachine
{
    // Fields common to all async state machines
    // (with caveats)
    private int state;
    private object awaiter;
    public AsyncTaskMethodBuilder<int> builder;
    public Action moveNextDelegate;
    private object stack;

    // Hoisted local variables

    // Methods
    [DebuggerHidden]
    public void SetMoveNextDelegate(Action action) { … }
    public void MoveNext() { … }
}

The caveats around the common field are in terms of the return type of the async method (which determines the type of builder used) and whether or not there are any awaits (if there are no awaits, the stack and awaiter fields aren’t generated).

Note that throughout this blog post I’ve changed the names of fields and types – in reality they’re all "unspeakable" names including angle-brackets, just like all compiler-generated names.

There’s a new assembly-wide interface

As you can see from the code above, the state machine implements an interface (actually called <>t__IStateMachine). One of these is created in the global namespace in each assembly that contains at least one async method or anonymous function, and it looks like this:

[CompilerGenerated]
internal interface IStateMachine
{
    void SetMoveNextDelegate(Action action);
}

The implementation for this method is always the same, and it’s trivial:

[DebuggerHidden]
public void SetMoveNextDelegate(Action action)
{
    this.moveNextDelegate = action;
}

Simplified skeleton method

The method which starts the state machine, which I’ve been calling the "skeleton" method everywhere, is now a bit simpler than it was. Something like this:

public static Task<int> FooAsync()
{
    StateMachine machine = new StateMachine();
    machine.builder = AsyncVoidMethodBuilder.Create();
    machine.MoveNext();
    return machine.builder.Task;
}

In fact if you decompile the IL, you’ll see that it doesn’t explicitly initialize the variable to start with – it just declares it, sets the builder field and then calls MoveNext(). That’s not valid C# (as all the struct’s fields aren’t initialized), but it is valid IL. It’s equivalent to the code above though. Note how there’s nothing to set the continuation – where previously the moveNextDelegate field would be populated within the skeleton method.

Just-in-time delegate creation

Now that the skeleton method doesn’t create the delegate representing the continuation, it can be done when it’s first required – which is when we first encounter an await expression for an awaitable which hasn’t already completed. (If the awaitable has completed before we await it, the generated code skips the continuation and just uses the results immediately and synchronously).

The code for that delegate creation is slightly trickier than you might expect, however. It looks something like this:

Action action = this.moveNextDelegate;
if (action == null)
{
    Task<int> task = this.builder.Task;
    action = new Action(this.MoveNext);
    ((IStateMachine) action.Target).SetMoveNextDelegate(action);
}

There are two oddities here, one of which I mostly understand and one of which I don’t understand at all.

I really don’t understand the "task" variable here. Why do we need to exercise the AsyncTaskMethodBuilder.Task property? We don’t use the result anywhere… does forcing this flush some memory buffer? I have no clue on this one. (See the update at the bottom of the post…)

The part about setting the delegate via the interface makes more sense, but it’s subtle. You might expect code like this:

// Looks sensible, but is actually slightly broken
Action action = this.moveNextDelegate;
if (action == null)
{
    action = new Action(this.MoveNext);
    this.moveNextDelegate = action;
}

That would sort of work – but we’d end up needing to recreate the delegate each time we encountered an appropriate await expression. Although the above code saves the value to the field, it saves it within the current value of the state machine… after we’ve boxed that value as the target of the delegate. The value we want to mutate is the one within the box – which is precisely why there’s an interface, and why the code casts to it.

We can’t even just unbox and then set the field afterwards – at least in C# – because the unbox operation is always followed by a copy operation in normal C#. I believe it would be possible for the C# compiler to generate IL which unboxed action.Target without the copy, and then set the field in that. It’s not clear to me why the team went with the interface approach instead… I would expect that to be slower (as it requires dynamic dispatch) but I could easily be wrong. Of course, it would also make it impossible to decompile the IL to C#, which would make my talks harder, but don’t expect the C# team to bend the compiler implementation for my benefit ;)

(As an aside to all of this, I’ve gone back and forth on whether the "slightly broken" implementation would recreate the delegate on every appropriate await, or only two. I think it would end up being on every occurrence, as even though on the second occurrence we’d be operating within the context of the first boxed instance, the new delegate would have a reference to a new boxed copy each time. It does my head in a little bit, trying to think about this… more evidence that mutable structs are evil and hard to reason about. It’s not the wrong decision in this case, hidden far from the gaze of normal developers, but it’s a pain to reason about.)

Single awaiter variable

In the CTP, each await expression generated a separate field within the state machine, and that field was always of the exact awaiter type. In the VS11 Developer Preview, there’s always exactly one awaiter field (assuming there’s at least one await expression) and it’s always of type object. It’s used like this:

  // Single local variable used by both continuation and first-time paths
  TaskAwaiter<int> localAwaiter;

  …

  if (conditions-for-first-time-execution)
  {
      // Code before await

      localAwaiter = task.GetAwaiter();
      if (localAwaiter.IsCompleted)
      {
          goto Await1Completed;
      }
      this.state = 1;
      TaskAwaiter<int>[] awaiterArray = { localAwaiter };
      this.awaiter = awaiterArray;
      // Lazy delegate creation goes here
      awaiterArray[0].OnCompleted(action);
      return;
  }
  // Continuation would get into here
  localAwaiter = ((TaskAwaiter<int>[]) this.awaiter)[0];
  this.awaiter = null;
  this.state = 0;
Await1Completed:
  int result = localAwaiter.GetResult(); 
  localAwaiter = default(TaskAwaiter<int>);

I realize there’s a lot of code here, but it does make some sense:

  • The value of the awaiter field is always either null, or a reference to a single-element array of the awaiter type for one of the await expressions.
  • A single localAwaiter variable is shared between the two code paths, populated either from the awaitable (on the initial code path) or by copying the value from the array (in the second code path).
  • The field is always set to null and the local variable is set to its default value after use, presumably for the sake of garbage collection

It’s basically a nice way of using the fact that we’ll only ever need one awaiter at a time. It’s not clear to me why an array is used instead of either using a reference to the awaiter for class-based awaiters, or simply by boxing for struct-based awaiters. The latter would need the same "unbox without copy" approach discussed in the previous section – so if there’s some reason why that’s actually infeasible, it would explain the use of an array here. We can’t use the interface trick in this case, as the compiler isn’t in control of the awaiter type (so can’t make it implement an interface).

Expression stack preservation

This one is actually a fix to a bug in the async CTP, which I’ve written about before. We’re used to the stack containing our local variables (in the absence of iterator blocks, captured variables etc, and modulo the stack being an implementation detail) but it’s also used for intermediate results within a single statement. For example, consider this block of code:

int x = 10;
int y = 5;
int z = x + 50 * y;

That last line is effectively:

  • Load the value of x onto the stack
  • Load the value 50 onto the stack
  • Load the value of y onto the stack
  • Multiply the top two stack values (50 and y) leaving the result on the stack
  • Add the top two stack values (x and the previously-computed result) leaving the result on the stack
  • Store the top stack value into z

Now suppose we want to turn y into a Task<int>:

int x = 10;
Task<int> y = Task.FromResult(5);
int z = x + 50 * await y;

Our state machine needs to make sure that it will preserve the same behaviour as the synchronous version, so it needs the same sort of stack. In the new-style state machine, all of that stack is saved in the "stack" field. It’s only one field, but may need to represent multiple different types within the code at various different await expressions – in the code above, for example, it represents two int values. As far as I can discover, the C# compiler generates code which uses the actual type of the value it needs, if it only requires a single value. If it needs multiple values, it uses an appropriate Tuple type, nesting tuples if it goes beyond the number of type parameters supported by the Tuple<…> family of types. So in our case above, we end up with code a bit like this:

// Local variable used in both paths
Tuple<int, int> tuple;

// Code before the await
if (conditions-for-first-time-execution) 

    …
    tuple = new Tuple<int, int>(this.x, 50);
    this.stack = tuple;
    …
}

// Continuation would get into here
tuple = (Tuple<int, int>) this.stack;
// IL copies the values from the tuple onto the stack at this point
this.stack = null;

// Both the fast and slow code paths get here eventually
this.z = stack0 + stack1 * awaiter.GetResult()

I say it’s a bit like this, because it’s hard to represent the IL exactly in C# in this case. The tuple is only created if it’s needed, i.e. not in the already-completed fast path. In that case, the values are loaded onto the stack but not then put into the tuple – execution skips straight to the code which uses the values already on the stack.

When the awaitable isn’t complete immediately, then a Tuple<int, int> is created, stored in the "stack" field, and the continuation is handed to the awaiter. On continuation, the tuple is loaded back from the "stack" field (and cast accordingly), the values are loaded onto the stack – and then we’re back into the common code path of fetching the value and performing the add and multiply operations.

Conclusion

As far as I’m aware, those are the most noticeable changes in the generated code. There may well still be a load more changes in Task<T> and the TPL in general – I wouldn’t be at all surprised – but that’s harder to investigate.

I’m sure all of this has been done in the name of performance (and correctness, in the case of stack preservation). The state machine is now much smaller in terms of the number of fields it requires, and objects are created locally as far as possible (including the state machine itself only requiring heap allocation if there’s ever a "slow" awaitable). I suspect there’s still some room for optimization, however:

  • Both the awaiter and the delegate use careful boxing and either arrays or a mutating interface to allow the boxed value to be changed. I suspect that using unbox with the concrete type, but without copying the value, would be more efficient. I may attempt to work this theory up into a test at some point.
  • If there’s only one awaiter type (usually TaskAwaiter<T> for some T), that type could be used instead of object, potentially reducing heap optimization
  • I’ve no idea why the builder.Task property is explicitly fetched and then the results discarded
  • If there’s only one await expression, the "stack" field can be strongly typed, which would also avoid boxing if only a single value needs to be within that stack
  • The stack field could be removed entirely when it’s not needed for intermediate stack value preservation. (I believe that would be the case reasonably often.)

The use of mutable value types is really fascinating (for me, at least) – I’m sure most people on the C# team would still say they’re evil, but when they’re used in a carefully controlled environment where real developers don’t have to reason about their behaviour, they can be useful.

Next time, I’ll hopefully get back to the idea I promised to write up before, about ordering a collection of tasks in completion order… before they’ve completed. (Well, sort of.) Should be fun…

Update (January 16th 2012)

Stephen Toub got in touch with me after I posted the original version of this blog entry, to explain the use of the Task property. Apparently the idea is that at this point, we know we’re going to return out of the state machine, so the skeleton method is going to access the Task property anyway. However, as we haven’t scheduled the continuation yet we also know that nothing will be accessing the Task property on a different thread. If we access it now for the first time, we can lazily allocate the task in the same thread that created the AsyncMethodBuilder, with no risk of contention. If we can force there to be a task ready and waiting for whatever accesses it later, we don’t need any synchronization in that area.

So why might we want to allocate the task lazily in the first place? Well, don’t forget that we might never have to wait for an await (as it were). We might just have an async method which takes the fast path everywhere. If that’s the case, then for certain cases (e.g. a non-generic, successfully completed task, or a Task<bool> which again has completed successfully) we can reuse the same instance repeatedly. Apparently this laziness isn’t yet part of the VS11 Developer Preview, but the reason for the property access is in preparation for this.

Another case of micro-optimization – which is fair enough when it’s at a system level :)