Iterate, damn you!

Do you know the hardest thing about presenting code with surprising results? It’s hard to do so without effectively inviting readers to look for the trick. Not that that’s always enough – I failed the last of Neal and Eric’s C# puzzlers at NDC, for example. (If you haven’t already watched the video, please do so now. It’s far more entertaining than this blog post.) Anyway, this one may be obvious to some of you, but there are some interesting aspects even when you’ve got the twist, as it were.

What does the following code print?

using System;
using System.Collections.Generic;

public class WeirdIterators
{
    static void ShowNext(IEnumerator<int> iterator)
    {
        if (iterator.MoveNext())
        {
            Console.WriteLine(iterator.Current);
        }
        else
        {
            Console.WriteLine("Done");
        }
    }
    
    static void Main()
    {
        List<int> values = new List<int> { 1, 2 };
        using (var iterator = values.GetEnumerator())
        {
            ShowNext(iterator);
            ShowNext(iterator);
            ShowNext(iterator);
        }
    }
}

If you guessed "1, 2, Done" despite the title of the post and the hints that it was surprising, then you’re at least brave and firm in your beliefs. I suspect most readers will correctly guess that it prints "1, 1, 1" – but I also suspect some of you won’t have worked out why.

Let’s look at the signature of List<T>.GetEnumerator(). We’d expect it to be

public IEnumerator<T> GetEnumerator()

right? That’s what IEnumerable<T> says it’ll look like. Well, no. List<T> uses explicit interface implementation for IEnumerable<T>. The signature actually looks like this:

public List<T>.Enumerator GetEnumerator()

Hmm… that’s an unfamiliar type. Let’s have another look in MSDN

[SerializableAttribute]
public struct Enumerator : IEnumerator<T>, 
    IDisposable, IEnumerator

(It’s nested in List<T> of course.) Now that’s odd… it’s a struct. You don’t see many custom structs around, beyond the familiar ones in the System namespace. And hang on, don’t iterators fundamentally have to be mutable.

Ah. "Mutable value type" – a phrase to strike terror into the hearts of right-headed .NET developers everywhere.

So what’s going on? If we’re losing all the changes to the value, why is it printing "1, 1, 1" instead of throwing an exception due to printing out Current without first moving?

Well, we’re fetching the iterator into a variable of type List<int>.Enumerator, and then calling ShowNext() three times. On each call, the value is boxed (creating a copy), and the reference to the box is passed to ShowNext().

Within ShowNext(), the value within the box changes when we call MoveNext() – which is how it’s able to get the real first element with Current. So that mutation isn’t lost… until we return from the method. The box is now eligible for garbage collection, and no change has been made to the iterator variable’s value. On the next call to ShowNext(), a new box is created and we see the first item again…

How can we fix it?

There are various things we can do to fix the code – or at least, to make it display "1, 2, Done". We can then find other ways of breaking it again :)

Change the type of the values variable

How does the compiler work out the type of the iterator variable? Why, it looks at the return type of values.GetEnumerator(). And how does it find that? It looks at the type of the values variable, and then finds the GetEnumerator() method. In this case it finds List<int>.GetEnumerator(), so it makes the iterator variable type List<int>.Enumerator.

If suppose just change values to be of type IList<int> (or IEnumerable<int>, or ICollection<int>):

IList<int> values = new List<int> { 1, 2 };

The compiler uses the interface implementation of GetEnumerator() on List<T>. Now that could return a different type entirely – but it actually returns a boxed List<T>.Enumerator. We can see that by just printing out iterator.GetType().

So if it’s just returning the same value as before, why does it work?

Well, this time we’re boxing once – the iterator gets boxed on its way out of the GetEnumerator() method, and the same box is used for all three calls to ShowNext(). No extra copies are created, and the changes within the box don’t get lost.

Change the type of the iterator variable

This is exactly the same as the previous fix – except we don’t need to change the type of values. We can just explicitly state the type of iterator:

using (IEnumerator<int> iterator = values.GetEnumerator())

The reason this works is the same as before – we box once, and the changes within the box don’t get lost.

Pass the iterator variable by reference

The initial problem was due to the mutations involved in ShowNext() getting lost due to repeated boxing. We’ve seen how to solve it by reducing the number of boxing operations down to one, but can we remove them entirely?

Well yes, we can. If we want changes to the value of the parameter in ShowNext() to be propagated back to the caller, we just need to pass the variable by reference. When passing by reference the parameter and argument types have to match exactly of course, so we can’t leave the iterator variable being type List<T>.Enumerator without changing the parameter type. Now we could explicitly change the type of the parameter to List<T>.Enumerator – but that would tie our implementation down rather a lot, wouldn’t it? Let’s use generics instead:

static void ShowNext<T>(ref T iterator)
    where T : IEnumerator<int>

Now we can pass iterator by reference and the compiler will infer the type. The interface members (MoveNext() and Current) will be called using constrained calls, so there’s no boxing involved…

… except that when you try to just change the method calls to use ref, it doesn’t work – because apparently you can’t pass a "using variable" by reference. I’d never come across that rule before. Interesting. Fortunately, we can (roughly) expand out the using statement ourselves, like this:

var iterator = values.GetEnumerator();
try
{
    ShowNext(ref iterator);
    ShowNext(ref iterator);
    ShowNext(ref iterator);
}
finally
{
    iterator.Dispose();
}

Again, this fixes the problem – and this time there’s no boxing involved.

Let’s quickly look at one more example of it not working, before I finish…

Dynamic typing to the anti-rescue

What happens if we change the type of iterator to dynamic (and set everything else back the way it was)? I’ll readily admit, I really didn’t know what was going to happen here. There are two competing forces:

  • The dynamic type is often really just object behind the scenes… so it will be boxed once, right? That means the changes within the box won’t get lost. (This would give "1, 2, Done")
  • The dynamic type is in many ways meant to act as if you’d declared a variable of the type which it actually turns out to be at execution time – so in this case it should work as if the variable was of type List<int>.Enumerator, just like our original code. (This would give "1, 1, 1")

What actually happens? I believe it actually boxes the value returned from GetEnumerator() – and then the C# binder DLR makes sure that the value type behaviour is preserved by copying the box before passing it to ShowNext(). In other words, both bits of intuition are right, but the second effectively overrules the first. Wow. (See the comment below from Chris Burrows for more information about this. I’m sure he’s right that it’s the only design that makes sense. This is a pretty pathological example in various ways.)

Conclusion

Just say "no" to mutable value types. They do weird things.

(Fortunately the vast majority of the time this particular one won’t be a problem – it’s rare to use iterators explicitly in the first place, and when you do you very rarely pass them to another method.)

19 thoughts on “Iterate, damn you!”

  1. Now if they’d put escape analysis in the jit we wouldn’t need this stuff.

    Perhaps a bit tricky for targets like TNA but there you go. Better in terms of compatibility would be the compiler spotting this sort of thing and hitching at you

    Like

  2. About variables declared in using statement, they’re considered ‘const’, a bit like the const C++ keywork that doesn’t exist in C#.
    If you could pass the variable by ref to a method, this method could change the variable to another instance, and the Dispose method would be called on this new instance instead of the one declared… weird !

    To enable mutable variables in using, the compiler should copy the original value to a hidden variable (but it would not work for structs anyway, the copy would be disposed, but not the original one)

    Like

  3. @thinkbeforecoding: Interesting – I was sure you could change the value of a “using” statement variable, but that the original value would still be disposed. I thought it copied it. Apparently not though :)

    Like

  4. IMHO, not only a good example of why to avoid mutable value types (why in the world did List not follow that rule? are there really common scenarios where this produces a useful performance improvement?), but also a good example of why to avoid “var”. After all, when you call GetEnumerator(), you _do_ expect it to return IEnumerator and will write code that makes that assumption. Using “var” allows violated assumptions to bite you in the butt.

    Now, granted…I would say this is really a problem in the implementation of List. It’s not like List.Enumerator has special features beyond the interface implementations. Not only is the mutable value type design oddball and likely to lead to problems like this, the fact that the implementation type is exposed at all seems wrong to me.

    On top of all that, as the new “dynamic” example shows (i.e. “new” as in, you couldn’t do this when List was originally designed), even if the List.GetEnumerator() method returned IEnumerator instead of List.Enumerator, you could still run into the mutable value type issue (because “dynamic” is doing that extra level of ensuring value type semantics).

    But even so, it seems to me that “var” is overly permissive and should be used only when it’s either necessary (because one is dealing with anonymous types) or the actual type used is absolutely clear from context (i.e. there’s an actual constructor in the assignment expression so that the type is named in the assignment itself).

    Even ignoring the other issues, I also find the idea of a value type implementing IDisposable (implied by IEnumerator) to be a bad idea. If the disposable resources are allocated after construction, one might get a copy of the value type with the disposable resources, but fail to call Dispose() on that copy. Even if the resources are allocated at construction and never modified, copies of the value type can’t be updated when the original Dispose() method is called to indicate disposal. So either the resources themselves have to be able to indicate that they are disposed, or the value type has no reliable way of throwing the proper ObjectDisposedException if misused. And on top of all that, without keeping track of all the copies of the value type that were in use, there’s no good way for the code to ensure all of the copies are disposed properly.

    I suppose that having an IDisposable value type is a subset of the more general mutable value type genre that should be avoided. But it seems like it just adds that much more problems. Maybe we need a corollary rule “avoid disposable value types” to go along with the more general one. :)

    And in any case, I don’t like List.GetEnumerator(). It just seems wrong to me. A glaring wart in a framework that is otherwise one that I like quite a lot.

    Like

  5. Wow, that’s scary… If I had stumbled upon a bug caused by this, I don’t know how long it would have taken me to find the cause

    Do you have any idea why the BCL designers implemented this enumerator as a struct ? It seems rather unnatural, but there must be a good reason for it…

    Like

  6. @Thomas: I believe it’s implemented that way for efficiency. In the case where you’re using foreach over a List, it means no extra objects need to be created on the heap.

    Liked by 1 person

  7. Premature optimization is the root of all etc. etc….

    In this case, the BCL team decided to be clever. They knew that most of the time, the only code calling GetEnumerator() was generated by compiler interpretation of foreach loops. They knew that a tiny amount of garbage disposal could be eliminated by having their enumerator be a struct so it could keep all its state in the stackframe. And they thought they could help everybody out by ‘optimizing’ this very common scenario of foreaching over a List.

    The thing is, this clever trick only gets to come into play if the foreach loop is given an explicitly typed List (not an IList, an IList, an ICollection, or an IEnumerable).

    Of course they also reckoned without var and dynamic coming along and making things complicated…

    It’s possible the language designers behind foreach were complicit in this too, of course – they were probably very pleased with themselves for specifying it in such a way that it could be micro-optimized by having a GetEnumerator method that returns a value type.

    It’s hard to imagine a way of preventing the BCL team from having written this method with this crazy return type, though. Unless they’d explicitly said it was an error if the value passed to foreach returned a value type from its GetEnumerator method. Even if they allowed value types but always boxed them, someone might have come up with a reason to try and return a value type anyway.

    I guess what’s more annoying is that you can’t possibly take advantage of a similar capability through the IEnumerable interface, because of autoboxing of interface return types.

    Like

  8. Oh, the anti-rescue! Haha.

    It is not the DLR that’s doing the dynamic bit, it’s the C# language binder. Before the binder does any work binding the operation itself (in this case, the method call), it makes sure to treat dynamic arguments as the best type that it can, in this case that’s List.Enumerator. So the rule we generate has an expression tree that is going to perform the unboxing conversion, and then ultimately call the method that way.

    The expression tree’s debug view looks like this:
    .Call WeirdIterators.ShowNext((System.Collections.Generic.IEnumerator`1[System.Int32])((System.Collections.Generic.List`1+Enumerator[System.Int32])$$arg1))

    I hope you’ll agree that although it doesn’t get you what you were looking for in this case, it _is_ the correct thing and the only design we could shipped with.

    If it were the case that the value-type enumerator were not accessible to your code, then we could never have unboxed it, but that’s OK because someone would have boxed it to get it to you anyway, … (and furthermore, it’s a good thing that MoveNext and Current are not explicitly implemented. Whew.)

    Btw, the read-onlyness of using statement declarations is broken, or in the best case the spec is a little unclear about what is intended.

    public class Program
    {
    struct MyStruct : IDisposable
    {
    public int x;
    public void Mutate() { ++x; }
    public void Dispose() { }
    }

    static readonly MyStruct fieldS = default(MyStruct);

    static void Main()
    {
    // A regular MyStruct

    MyStruct localS = default(MyStruct);
    Console.WriteLine(localS.x);
    localS.Mutate();
    Console.WriteLine(localS.x);

    // A readonly MyStruct

    Console.WriteLine(fieldS.x);
    fieldS.Mutate();
    Console.WriteLine(fieldS.x);

    // Ok, what now?

    using (MyStruct usingS = default(MyStruct))
    {
    Console.WriteLine(usingS.x);
    usingS.Mutate();
    Console.WriteLine(usingS.x);
    }
    }
    }

    chris

    Like

  9. @Chris: Thanks for the details. I’ll edit the post to correct this (and refer to your comment).

    Eric was mentioning the oddities involved in having readonly fields in structs just recently (and how it didn’t mean what it looked like). I’m not sure what the spec could do here, to be honest – other than possibly not try to prohibit it at all.

    Like

  10. Actually this is scary. It is not 100% analogous too vector in C++, but close. Premature optimization.

    But roots of this code lie elsewhere — this is just an example how badly C# is designed. It shouldn’t be up to class/struct whether it is passed by reference or by value, but it should be up to method/function. Like in C++.

    In case of C# such decision make it harder to write flexible functions — because you could expect class (reference) and you get value, because somebody, somewhere decided to “optimize” something.

    Don’t get me wrong, I love lambdas, Linq in C#, but at the same time I faint when I see all the mistakes done again, only in other places than before. And unfortunately this rotten design is exactly the core of the language.

    Like

  11. @Macas: I’m afraid I disagree completely. I like the reference type / value type distinction in C#, and wouldn’t trade it for C++ at all. Without making the value type mutable, this wouldn’t have been a problem… and it’s not like there aren’t *plenty* of traps to be aware of in C++ too.

    Like

  12. I wrote a couple of blog posts on this a few weeks back (http://www.simple-talk.com/community/blogs/simonc/archive/2010/05/19/91401.aspx and http://www.simple-talk.com/community/blogs/simonc/archive/2010/05/20/91415.aspx), although it looked at it from a slightly different angle. I got a response from Sasha Goldshtein at microsoft, trying to justify it: http://blogs.microsoft.co.il/blogs/sasha/archive/2010/07/05/enumerator-structs-optimizing-for-the-common-case.aspx

    This is still crazy behaviour, unfortunately, and it can’t be changed now. Unfortunately, things like var make this problem more acute in recent versions of C#

    Like

  13. Oh, and why foreach works on list of structs? It also is using the GetEnumerator method, even when interface IEnumerable is not implemented.

    Like

  14. Awesome post, Jon! I guessed correctly – but I didn’t want to believe that the .NET BCL devs actually used a mutable value type… Yuck.

    Quite entertaining though :).

    @Lonli-Lokli: In Jon’s example, the IEnumerator is getting boxed on each call to ShowNext(), creating a snap shot of the enumerator’s state. In a foreach, there’s absolutely no reason for the generated IL to box the enumerator prior to calling MoveNext() each time; therefore, it works as expected.

    -Charles

    Like

  15. I feel myself so stupid. I gave wrong answer. I just cannot understand this solution. I thought it is good practise to use return IEnumerable type instead struct.

    Like

    1. I’m not sure what you’re trying to say here. Note that it’s only due to var that this is causing a problem – basically, this was a design choice in the framework that it mostly useful, but has occasional downsides.

      Like

Leave a comment