Non-iterable collection initializers

Yesterday on Stack Overflow, I mentioned that sometimes I make a type implement IEnumerable just so that I can use collection initializers with it. In such a situation, I use explicit interface implementation (despite not really needing to – I’m not implementing IEnumerable<T>) and leave it throwing a NotImplementedException. (EDIT: As noted in the comments, throwing NotSupportedException would probably be more appropriate. In many cases it would actually be pretty easy to implement this in some sort of meaningful fashion… although I quite like throwing an exception to indicate that it’s not really intended to be treated as a sequence.)

Why would I do such a crazy thing? Because sometimes it’s helpful to be able to construct a "collection" of items easily, even if you only want the class itself to really treat it as a collection. As an example, in a benchmarking system you might want to be able to add a load of tests individually, but you never want to ask the "test collection" what tests are in it… you just want to run the tests. The only iteration is done internally.

Now, there’s an alternative to collection initializers here: parameter arrays. You can add a "params string[]" or whatever as the final constructor parameter, and simply use the constructor. That works fine in many cases, but it falls down in others:

  • If you want to be able to add different types of values, without just using "params object[]". For example, suppose we wanted to restrict our values to int, string, DateTime and Guid… you can’t do that in a compile-time-safe way using a parameter array.
  • If you want to be able to constructor composite values from two or more parts, without having to explicitly construct that composite value each time. Think about the difference in readability between using a collection initializer for a dictionary and explicitly constructing a KeyValuePair<TKey, TValue> for each entry.
  • If you want to be able to use generics to force aspects of type safety. The Add method can be generic, so you could, for example, force two parameters for a single entry to both be of T, but different entries could have different types. This is pretty unusual, but I needed it just the other day :)

Now, it’s a bit of a hack to have to "not quite implement" IEnumerable. I’ve come up with two alternative options. These have the additional benefit of not requiring the method to always be called Add any more. I suspect it still would be in most cases, but flexibility is a bonus.

Option 1: Class level attribute

Instead of just relying on IEnumerable, the compiler could detect an attribute applied to the class, specifying the single method name for all collection initializer methods:

[CollectionInitializerMethod("AddValue")]
public class RestrictedValues
{
    public void AddValue(int x) { … }

    public void AddValue(string y) { … }
}

var values = new RestrictedValues
{
    3, "value", 10
};

Option 2: Method level attributes

In this case, each method eligible for use in a collection initializer would be decorated with the attribute:

public class RestrictedValues
{
    [CollectionInitializerMethod]
    public void AddInt32(int x) { … }

    [CollectionInitializerMethod]
    public void AddString(string y) { … }
}

var values = new RestrictedValues
{
    3, "value", 10
};

This has the disadvantage that the compiler would need to look at every method in the target class when it found a collection initializer.

Obviously both of these can be made backwardly compatible very easily: the presence of an implementation of IEnumerable with no attributes present would just fall back to using Add.

Option 3: Compiler and language simplicity

(I’ve added this in response to Peter’s comment.)

Okay, let’s stick with the Add method. All we need is another way of indicating that you should be able to use collection initializers with a type:

[AllowCollectionInitializers] 
public class RestrictedValues 

    public void Add(int x) { … } 

    public void Add(string y) { … } 
}

At this point, the changes required to the compiler (and language spec) are really minimal. In the bit of code which detects whether or not you can use a collection initializer, you just need to change from "does this type implement IEnumerable" to "does this type implement IEnumerable or have the relevant attribute defined". I can’t think of many possible language changes which would be more localized than that.

And another thing…

One final point. I’d still like the ability for collection initializers to return a value, and for that value to be used for subsequent elements of the initialization – with the final return value being the last-returned value. Any methods with a void return value would be treated as if they returned "this". This would allow you to build immutable collections easily.

Likewise you could decorate methods with a [PropertyInitializerMethod] to allow initialization of immutable types with "WithXyz" methods. Admittedly there’s less need for this now that we have optional parameters and named arguments in C# 4 – a lot of the benefits of object initializers are available just with constructors.

Anyway, just a few slightly odd thoughts around initialization for you to ponder over the weekend…

15 thoughts on “Non-iterable collection initializers”

  1. Well, at least you tagged this “Wacky Ideas”. :)

    I see where you’re coming from. But I’m not convinced it meets the cost/benefit bar. For one, while more reliable, code attributes seem much more “heavy weight” to me.

    The other question I have is, if we apply this reasoning to collection initializers, why not to LINQ implementation methods too? After all, they have a similar sort of “magic name” status as the Add() method for collection initializers.

    Then we get into your request for support for immutable collections. While I found Eric Lippert’s series on immutable collections fascinating and I do see the practical benefits, adding this support into this particular feature seems like a lot of extra cost.

    As it is now, collection initializers can be implemented by the compiler in a VERY simple way. Straight mechanical translation of the initialization code, and then just run the result through the old compiler logic to handle overload resolution.

    I think that simplicity is likely one of the reasons the syntax was even allowed into the language. The benefits are relatively minor, so the cost has to be similarly minor. Truth is, for collections of any significant size, you don’t really want those translated into a series of method calls anyway. And for short collections, hand-writing the code in the more elaborate scenarios where the built-in syntax doesn’t quite hack it shouldn’t be that big of a deal.

    Like

  2. @Peter: I’ve added a third option for you, to be as minimal a change as possible. Would you still be against that change? I’m not sure what you see as “heavy” about attributes, mind you. They’ve been used this way before – think about extension methods, and the DefaultPropertyAttribute as two examples.

    Like

  3. Not such a good idea to throw a NotImplementedException on IEnumerable instances.

    Often we have to deal with generic or dynamic values, and thus there’s much code like this in the wild:

    string GiveMeAProperString(object obj)
    {
    if (obj == null) return String.Empty;

    var str = obj as string;
    if (str != null) return str;

    var col = obj as IEnumerable;
    if (col != null) return String.Join(“, “, col);

    // … some more cases

    return obj.ToString();
    }

    Thus I’d vote for option 2 or 3. Alternatively, I think it should be possible in most cases to implement the IEnumerable on classes like RestrictedValues anyway.

    Like

  4. To be clear, I don’t feel strongly “against” any “improvements” in this area at all. I’m just skeptical that, were I the language designer, I would find the benefits worth the effort.

    Personally, it’s not even clear to me why the type needs to implement IEnumerable. It’s not like the compiler really _needs_ the type to implement IEnumerable in order to do the work, nor does implementing IEnumerable suffice to enable the initializer syntax.

    I’d rather just see the requirement to implement IEnumerable removed altogether. Then, you’d just get the same error you get today if the necessary Add() method wasn’t present: an instance of CS0117 for each item in the initializer list. That’s no worse than implementing IEnumerable today, but not having the appropriate Add() method.

    My concern about using attributes involves at least two different issues: first, that it’s just one more thing for the compiler to check (and keep track of, if you’re using per-method attributes); and second, that it’s essentially enabling custom syntax within the language.

    On the first point, one of the things I really like about C# is how fast the compiler is. I don’t know all the fine details as to why the C# compiler is orders of magnitude faster than the C++ compiler for similar code (even for code compiled without optimizations), but it is. It can probably afford to slow down a little to handle stuff like this, but I would still prefer to not add the weight without some significant benefit.

    On the second point, this is in fact related to the question of macros in the language. One of the abuses of macros in C++ is when programmers use them to basically restructure the syntax to look like something decidedly not C++. Now granted, supporting custom initializer method names isn’t going enable nearly the degree of syntax rewriting that macros in C++ can. But it still does it a little.

    I think there’s some value in a programmer being able to take a quick look at the time, see the Add() method (and the IEnumerable, if that continues to be a requirement) and know what’s going on. Yes, the attributes are there in code, but it’s just one more thing for the programmer to have to parse while reading the code, slowing comprehension. And that information doesn’t show up in Intellisense at all.

    Anyway, I don’t mean to beat up on your suggestions as much as it probably seems like I am. I don’t think they are _bad_ suggestions per se. As I mentioned, I do understand the motivation behind them. I just think that the benefit is relatively minimal (seems like a minor added convenience to me), and that there are in fact some non-trivial costs to consider.

    Like

  5. @herzmeister der welten: The types I’m thinking of simply wouldn’t be used in that sort of context. Not *every* type needs to be applicable in *every* situation.

    As it happens, the types I’ve been using this pattern for have typically been in test code rather than production code.

    Like

  6. I’d just like to say that throwing a NotImplementedException is bad practice. NotImplementedException implies that you will eventually implement it. It’s placeholder code.

    The right exception to throw is NotSupportedException. Exactly what it says on the tin. ‘Enumerating this object is not a supported operation’.

    (By the way, InvalidOperationException is the one to throw when the operation could be valid for a class, just isn’t valid for the current state of the current instance)

    Keeping this distinction intact is useful, because then if you see a NotSupportedException you know that the caller has done something wrong, while if you see a NotImplementedException then you know that the code you’re calling is not finished.

    Like

  7. @Joren: Yes, I agree – NotSupportedException (with an appropriate message) would be better. Will edit the post appropriately to indicate this.

    Like

  8. @Ryan: The test runner will contain more than the list of tests. It may use something like a List internally, but also know about how to run the tests, and a TestFormatter to write the results to or whatever.

    Like

  9. Wouldn’t this be a clean way to say what you’re describing?

    TestExecutionHarness myHardness = new TestExecutionHarness(new List() {test1, test2, test3});

    Like

  10. @Ryan: Sometimes, but not always. For one thing, you’ve instantly introduced extra fluff. For another, you might have multiple overloads for different types which are *converted* to tests… think of different delegate types you might support, for example.

    Just to be clear, I’m not saying I use this all over the place – just that it’s happened often enough in sensible places (IMO) to make it worth at least considering.

    Like

  11. That’s one of the only places where the implicit cast operator shines. I wouldn’t normally use it this way, but you can perhaps have the classes this way

    public class Test {

    public static implicit operator Test(Action);
    public static implicit operator Test(Action);
    }

    public class TestHarness {
    public TestHarness(params Test[] tests);
    }

    Like

  12. @configurator: That’s not going to help if you’re using lambda expressions, I don’t think… I’ve just tried your code:

    Test.cs(23,19): error CS1660: Cannot convert lambda expression to type ‘Test’ because it is not a delegate type

    Like

  13. That’s a shame. I thought I had a good idea there!

    It would still work for other data, e.g. an initializer that accepts ints or strings.

    Like

  14. How about:

    interface ICollectionInitializableWith
    {
    void AddValue(T value);
    }

    public class RestrictedValues : ICollectionInitializableWith, ICollectionInitializableWith
    {
    public void AddValue(int x) { … }

    public void AddValue(string y) { … }
    }

    var values = new RestrictedValues
    {
    3, “value”, 10
    };

    Like

Leave a comment