How can I enumerate thee? Let me count the ways…

This weekend, I was writing some demo code for the async chapter of C# in Depth – the idea was to decompile a simple asynchronous method and see what happened. I received quite a surprise during this, in a way which had nothing to do with asynchrony.

Given that at execution time, text refers to an instance of System.String, and assuming nothing in the body of the loop captures the ch variable, how would you expect the following loop to be compiled?

foreach (char ch in text)
{
    // Body here
}

Before today, I could think of four answers depending on the compile-time type of text, assuming it compiled at all. One of those answers is if text is declared to be dynamic, which I’m not going to go into here. Let’s stick with static typing.

If text is declared as IEnumerable

In this case, the compiler can only use the non-generic IEnumerator interface, and I’d expect the code to be roughly equivalent to this:

IEnumerator iterator = text.GetEnumerator();
try
{
    while (iterator.MoveNext())
    {
        char ch = (char) iterator.Current;
        // Body here
    }
}
finally
{
    IDisposable disposable = iterator as IDisposable;
    if (disposable != null)
    { 
        disposable.Dispose();
    }
}

Note how the disposal of the iterator has to be conditional, as IEnumerator doesn’t extend IDisposable.

If text is declared as IEnumerable<char>

Here, we don’t need any execution time casting, and the disposal can be unconditional:

IEnumerator<char> iterator = text.GetEnumerator();
try
{
    while (iterator.MoveNext())
    {
        char ch = iterator.Current;
        // Body here
    }
}
finally
{
    iterator.Dispose();
}

If text is declared as string

Now things get interesting. System.String implements IEnumerable<char> using explicit interface implementation, and exposes a separate public GetEnumerator() method which is declared to return a CharEnumerator.

Usually when I find a type doing this sort of thing, it’s for the sake of efficiency, to reduce heap allocations. For example, List<T>.GetEnumerator returns a List<T>.Enumerator which is struct with the appropriate iteration members. This means if you use foreach over an expression of type List<T>, the iterator can stay on the stack in most cases, saving object allocation and garbage collection.

In this case, however, I suspect CharEnumerator was introduced (way back in .NET 1.0) to avoid having to box each character in the string. This was one reason for foreach handling to be based on types obeying the enumerable pattern, as well as there being support through the normal interfaces. It strikes me that it could still have been a structure in the same way as for List<T>, but maybe that wasn’t considered as an option.

Anyway, it means that I would have expected the code to be compiled like this, even back to C# 1:

CharEnumerator iterator = text.GetEnumerator();
try
{
    while (iterator.MoveNext())
    {
        char ch = iterator.Current;
        // Body here
    }
}
finally
{
    iterator.Dispose();
}

What really happens when text is declared as string

(This is the bit that surprised me.)

So far, I’ve been assuming that the C# compiler doesn’t have any special knowledge about strings, when it comes to iteration. I knew it did for arrays, but that’s all. The actual result – under the C# 5 compiler, at least – is to use the Length property and the indexer directly:

int index = 0;
while (index < text.Length)
{
    char ch = text[index];
    index++;
    // Body here
}

There’s no heap allocation, and no Dispose call. If the variable in question can change its value within the loop (e.g. if it’s a field, or a captured variable, or there’s an assignment to it within the body) then a copy is made of the variable value (just a reference, of course) first, so that all member access is performed on the same object.

Conclusion

So, there we go. There’s nothing particularly mind-blowing here – certainly nothing to affect your programming style, unless you were deliberately avoiding using foreach on strings "because it’s slow." It’s still a good lesson in not assuming you know what the compiler is going to do though… so long as the results are as expected, I’m very happy for them to put extra smarts in, even if it does mean having to change my C# in Depth sample code a bit…