Nasty generics restrictions

So, I caved and finally downloaded the LINQ preview. Obviously it’s fairly heavily genericised (if that’s even a word) and I decided to push it a little. Nothing particularly heavy – just an interesting bit of functional programming.

It’s easy to do a query which returns a string property from an object:

var items = new[] 
{
    new {Data = "First"},
    new {Data = "Second"},
    new {Data = "Third"}
};

IEnumerable<string> query = from item in items
                            select item.Data;

foreach (string s in query)
{
    Console.WriteLine (s);
}

The above prints out:

First
Second
Third

It’s not particularly hard to return a string from an object, having performed an operation on that string, where the operation is also specified by
the object:

var items = new[] 
{
    new {Data = "First", Operation = (Func<string,string>) (s => s+s)},
    new {Data = "Second", Operation = (Func<string,string>) (s => s.Substring(1))},
    new {Data = "Third", Operation = (Func<string,string>) (s => s)}
};

IEnumerable<string> query = from item in items
                            select item.Operation(item.Data);

foreach (string s in query)
{
    Console.WriteLine (s);
}

The first operation is plain concatenation; the second takes the first letter off the string, and the third is the identity operator. The above prints out:

FirstFirst
econd
Third

So, the next thing I wanted to try was performing the operation twice. using the output of the first as the input of the second. I could have just used item.Operation(item.Operation(item.Data)) but where would the fun be in that? Instead, I wanted to have an operator whose parameters were a transformation from one type to the same type and an initial value, returning the same type. I’d hoped it would be as simple as Func<Func<T,T>,T,T> doItTwice = (op, input) => op(op(input));. After all, the implementation we’ve given doesn’t care in the slightest what the type involved is. Unfortunately, .NET generics don’t allow that kind of thing as far as I can tell – because it doesn’t know whether T is going to be a reference type or a value type, it can’t create the appropriate concrete implementation. Changing the declaration to use string instead of T everywhere works fine, but it’s much less elegant.

The interesting thing is that (lambda functions aside) I believe that would be possible in Java. Java generics are much weaker in terms of implementation, but allow some things to be expressed which you just can’t do in .NET. (The reverse is true too, of course, partly because .NET generics can involve value types and Java generics can’t. So, to put on my best whiny voice – why can’t we have the best of both worlds? I understand that it probably makes things a bit harder in terms of implementation, but come on, that ‘s the kind of thing which only has to be worked out once, by one team – whereas hundreds of thousands of developers are going to be actually using it.

The good news is that playing with lambda functions is still fun, just like I expected it to be.

5 thoughts on “Nasty generics restrictions”

  1. You state:

    that ‘s the kind of thing which only has to be worked out once, by one team – whereas hundreds of thousands of developers are going to be actually using it.

    The omission here is that because you are going to have hundreds of thousands of developers using it, it has to be done just right.

    It reminds me of the reason that the constraint system in Generics is not much more robust out of the box. Anders wanted to err on the side of caution, seeing what the most prominent use cases would be, before committing to something that could not be changed in the future.

    Otherwise, I would reluctantly agree it would be nice if I could access elements of the contract (the Generic which has partial or no instantiation because types were not passed) which were not dependent on the type parameters.

    Like

  2. The bottom of this issue is that you’d like a parameterized storage location for your data. You can accomplish that by using a field inside a parameterized type. Alternatively, you can of course just return it from a parameterized method. C# type inferencing *should* be able to flow the correct value of T from the callsite to your implementation.

    I say *should* because evidently the compiler doesn’t. This is unfortunate, because it has all the information it needs to make this conclusion, but perhaps it’s a bug in the early 3.0 compiler which will get fixed. For example,

    using System;
    using System.Collections.Generic;
    using System.Query;

    class Program {
    static Func<Func<T,T>,T,T> doItTwice<T>() { return (op, input) => op(op(input)); }

    static void Main() {
    var items = new[]
    {
    new {Data = "First", Operation = (Func<string,string>) (s => s+s)},
    new {Data = "Second", Operation = (Func<string,string>) (s => s.Substring(1))},
    new {Data = "Third", Operation = (Func<string,string>) (s => s)}
    };

    var query = from item in items
    select doItTwice()(item.Operation, item.Data);

    foreach (string s in query)
    Console.WriteLine (s);
    }
    }

    This code currently fails with an error "cannot be inferred from usage," which is crap. Alternatively, you can place doItTwice inside a type, e.g.

    struct P<T> {
    static Func<Func<T,T>,T,T> doItTwice = (op, input) => op(op(input)); }
    }

    In both cases the C# compiler currently fails to correctly flow the T.

    But I must point out that this isn’t a byproduct of the CLR (as you stated), but rather the language itself.

    Like

  3. You’re right about the compiler failing to do things which it should be able to, but the first statement you made (i.e. that I want a parameterized storage location) is the crucial one, I think – if such a thing doesn’t exist as far as the CLR is concerned, then what I *specifically* wanted originally (i.e. an appropriately parameterized local variable) is a CLR issue, isn’t it?

    Like

  4. We do provide a way to have a parameterized storage location, e.g. by placing it inside a class. Classes can be nested arbitrarily, so using class scoping rather than a new field-level scope will work.

    Languages could surface this scoping in whatever way they wanted, however–e.g. C# could turn this:

    static Func<Func<T,T>,T,T> doItTwice = …;

    into

    struct __$0foobar<T> {
    static Func<Func<T,T>,T,T> doItTwice = …;
    }

    and mask a lot of the complexity. In fact, I’m not sure what F# does, but presumably something similar.

    I do agree, on the other hand, that having this in the CLR would be cool. It would make it easier for languages to expose such functionality. Additionally, my above suggestion would makes it strange when sharing types with languages that don’t understand the compiler hackery. Whenever we find something like that, it usually hints at a new runtime feature.

    I still would not say its absence indicates a "limitation" of the CLR’s type system, however. You can express what you want to using existing constructs–the C# language isn’t helping you though.

    Like

Leave a comment