Category Archives: C#

C#, CSharpDev, CSharpDevCenter, Google, Protocol Buffers

Lessons learned from Protocol Buffers, part 3: generic type relationships

August 29, 2008 jonskeet 1 Comment

In part 2 of this series we saw how the message and builder interfaces were self-referential in order to allow the implementation types to be part of the API. That’s one sort of relationship, but in this post we’ll see how the two interfaces relate to each other. If you remember from part 1 every generated message type has a corresponding builder type. As it happens, this is implemented with a nested type, so if you had a Person message, the generated types would be Person and Person.Builder (in a specified namespace, of course).

Without any interfaces involved, this would be very simple. The types would just look like this (with more members, of course):

public class Person
{
public static Builder CreateBuilder() { … }

public Builder CreateBuilderForType() { … }

    public class Builder
    {
        public Builder() { … }

public Person Build() { … }
}
}

You may well be wondering why there are two methods for creating a builder. The static method is convenient for code which knows it’s dealing with the Person message. The instance method ends up being part of the message interface, which makes it useful for code which can work with any message. In addition, the constructor for Person.Builder is accessible in the C# version. In the original Java code the only way of creating a builder is via the methods in the message class; I decided to remove this restriction for the sake of making the oh-so-readable object initializer syntax available in C# 3.

Redesigning the interfaces to refer to each other

In part 2 we created self-referential interfaces for the message and builder interfaces which looked like this:

public interface IMessage<TMessage> where TMessage : IMessage<TMessage>
{
…
}

public interface IBuilder<TBuilder> where TBuilder : IBuilder<TBuilder>
{
…
}

The constraints on the type parameters allow us to make the API very specific, and we can use the same trick again when we relate the builder and message types together. The step where we introduce a new type parameter to each of them is straightforward:

public interface IMessage<TMessage, TBuilder> where TMessage : IMessage<TMessage, TBuilder>
{
…
}

public interface IBuilder<TMessage, TBuilder> where TBuilder : IBuilder<TMessage, TBuilder>
{
…
}

Unfortunately without any restrictions on the “foreign” type parameter in each interface, we don’t get enough information to make everything work. We need to tie the two types together more tightly, like this:

public interface IMessage<TMessage, TBuilder>
    where TMessage : IMessage<TMessage, TBuilder>
    where TBuilder : IBuilder<TMessage, TBuilder>
{
    …
}

public interface IBuilder<TMessage, TBuilder>
    where TMessage : IMessage<TMessage, TBuilder>
    where TBuilder : IBuilder<TMessage, TBuilder>
{
    …
}

To make this concrete for Person and Person.Builder we end up with implementations like this:

public class Person : IMessage<Person, Builder>
{
public static Builder CreateBuilder() { … }

public Builder CreateBuilderForType() { … }

    public class Builder : IBuilder<Person, Builder>
    {
        public Builder() { … }

public Person Build() { … }
}
}

This works, but it’s really ugly. Any generic methods wanting to take a TMessage type parameter implementing IMessage<TMessage, TBuilder> have to also have a TBuilder type parameter, and the two constraints need to be expressed each time. It’s a real pain. In fact, I’ve got an IMessage<TMessage> interface which contains almost nothing in it (and which the more generic interface extends). This allows me to get hold of the message type (and use it in the API), inferring the builder type by reflection. That’s a pain too, frankly. It’s a particular nuisance because when I do infer the builder type, I haven’t actually got any compile-time constraint the lets any other code know that it’s the right builder type for the message type. In one specific case it’s led to this horrific method (in a type generic in TMessage:

private static TMessage BuildImpl<TMessage2, TBuilder> (Func<TBuilder> builderBuilder,
                                                        CodedInputStream input,
                                                        ExtensionRegistry registry)
    where TBuilder : IBuilder<TMessage2, TBuilder>
    where TMessage2 : TMessage, IMessage<TMessage2, TBuilder>
{
    TBuilder builder = builderBuilder();
    input.ReadMessage(builder, registry);
    return builder.Build();
}

Fortunately this is hidden from public view – and the only reason to do it at all is to enable a pleasant API of MessageStreamIterator<TMessage> : IEnumerable<TMessage> where TMessage : IMessage<TMessage>. The result of the evil method above is exactly what the caller is likely to want, otherwise I wouldn’t put up with it. However, that sort of excuse has been coming up far too much in the PB implementation, so I’ve had a quick think about what could be done about it.

Contemplating a more expressive language

I should really prefix this section by saying that I’m not actually suggesting this as a way forward for C# or .NET. (I suspect it would take more work in the CLR as well as just in the language; I don’t know enough about CLR generics to say for sure, but I’d be surprised if this were feasible.) I haven’t encountered many situations where I’ve wanted anything like this, and the extra complexity in the language would be quite high, I suspect. Suppose an interface could contain extra type parameters, including constraints, in the body of the interface:

// Purely imaginary syntax!
public interface IMessage<TMessage> where TMessage : IMessage<TMessage>
{
<TBuilder> where TBuilder : IBuilder<TBuilder>, TBuilder.TMessage : TMessage

// Normal methods, which could use TBuilder
}

public interface IBuilder<TBuilder> where TBuilder : IBuilder<TBuilder>
{
<TMessage> where TMessage : IMessage<TMessage>, TMessage.TBuilder : TBuilder

// Normal methods, which could use TMessage
}

There are various ways in which the interface implementation could indicate the type of TBuilder. The syntax itself isn’t particularly interesting – it’s the extra information which is conveyed which is the important bit. I’ve dithered between this being a step forward and it not. At first glance it looks no better than having both type parameters in the interface declaration, but I believe it would genuinely make a difference. For instance, the above evil method could be written as:

private static TMessage BuildImpl(Func<TMessage.TBuilder> builderBuilder,
                                  CodedInputStream input,
                                  ExtensionRegistry registry)
{
    TMessage.TBuilder builder = builderBuilder();
    input.ReadMessage(builder, registry);
    return builder.Build();
}

This time there’s no need for the method to be generic, because the type is already generic in the message type. Furthermore, we can call this method with no reflection. All other APIs which have previously had to be specify two type parameters can now just specify the one. Apart from anything else, this leaves more scope for type inference in generic methods – passing either a message or a builder to a generic method happens occasionally, but it’s very rare to pass in both.

We’ve essentially expressed the relationship between the message type and the builder type a little more explicitly, so that we can guarantee it exists (and use it) at compile time. That’s at the heart of the problem to start with – without a second type parameter in the initial interface declaration, in the current language there’s no way of expressing a close relationship with another type.

Conclusion

I don’t think it would be fair to say that C# really lets us down here – it happens not to support a pretty rare scenario, and that’s fair enough. I’d be interested to know whether any other languages allow the same sort of concepts to be expressed more pleasantly. The ugly solution I’ve presented here does at least work, and it’s nearly invisible to most users, who are likely to just reference the concrete generated types. I’m not happy with the verbosity which has become necessary in many places, but it’s in a good cause. It’s interesting to note that the Java API doesn’t use this sort of doubly-generic relationship: again, covariant return types allow the concrete message and builder types to express their APIs directly and still implement a more general interface at the same time.

In the next part I’ll look at another possibility which would make interfaces and generics a more powerful combination: static interface methods.

C#, CSharpDev, CSharpDevCenter, Protocol Buffers

Lessons learned from Protocol Buffers, part 2: self-referential generic types

August 20, 2008 jonskeet 3 Comments

In the first part of this series we saw that a message type and its builder are closely related. The tricky bit comes when we want to define an interface describing messages and builders. Although some members clearly depend on the data being built (the first and last name in the person example above, for instance) others apply to all messages or all builders. For instance, a message can always provide you with a suitable builder, and a builder always allows you to build it to create the actual message. Likewise the message and builder types also have methods which return other instances of themselves – you can ask any message for the default message of the same type, or clone a builder. Many common builder methods effectively return this (i.e. the same builder) – but the declared return type needs to be the concrete type involved, not just the interface, otherwise you couldn’t then use the returned builder to set properties without casting.

(Aside: some of the members of the common interface would be more pleasant if they could be declared statically. We’ll look at that later in the series.)

We have two slightly different issues here: defining the interface to allow members to return the concrete types, and tying builders and messages together. This post will just talk about the first of these issues. Enjoy the luxury of only having to think about one type parameter at a time – it won’t last long.

First encounters of the self-referential kind

I first came across a generic constraint which referred to itself back in the early days of Java 5. Here’s the declaration for java.lang.Enum:

public abstract class Enum<E extends Enum<E>>

Assuming you’re more comfortable in C#, I’ll translate that into C# syntax:

public abstract class Enum<T> where T : Enum<T>

The constraint is easier read than understood. Any concrete, constructed class deriving from this will be an “enum of something” where something itself an “enum of something“.

Now, Java puts additional restrictions on the Enum class (it’s like System.Delegate in C# – you can’t explicitly derive from it yourself; you have to let the compiler do it for you). However, the syntax is perfectly valid in “normal” code. Typically when you encounter this kind of type constraint, you satisfy it in new classes by using the same class as the type argument for T. So, in the Enum example we might have:

public sealed class Currency : Enum<Currency>
{
// Code
}

public sealed class Status : Enum<Status>
{
// Code
}

There’s nothing to actually stop you from declaring class Status : Enum<Currency> – it’s not just not normally useful. Likewise you can leave the derived type as a generic one, but again that’s atypical. I don’t know any way to enforce the usual implementation – short of building into the language, as Java did – but it’s generally not a problem.

Back to Protocol Buffers

So why is this useful? Well, moving on from enums let’s look at the builder interface in Protocol Buffers. Here’s part of it – somewhat simplified, admittedly:

public interface IBuilder<TBuilder> where TBuilder : IBuilder<TBuilder>
{
    TBuilder Clear();
    TBuilder Clone();
    TBuilder ClearField(FieldDescriptor field);
    TBuilder AddRepeatedField(FieldDescriptor field, object value);
    TBuilder SetUnknownFields(UnknownFieldSet unknownFields);
    TBuilder MergeUnknownFields(UnknownFieldSet unknownFields);
    TBuilder MergeFrom(ByteString data);
    TBuilder MergeFrom(CodedInputStream input);
    TBuilder MergeFrom(CodedInputStream input, ExtensionRegistry registry);
}

None of those methods mention the actual message directly – for that we need another type parameter, as we’ll see in the next post – but all of them return a TBuilder. As it happens, the interface documentation requires that all the methods return the same reference back, just as StringBuilder methods do, but you could equally create an interface around immutable types, expecting each operation to return a new value. For instance, you could create an IArithmetic<T> interface such that int could implement IArithmetic<int>, double could implement IArithmetic<double> etc. You can then chain multiple operations together, e.g. 5.Add(10).Multiply(2) and know that you’re always within the world of integers.

It’s important to note that the return type of each of the methods in our builder interface is TBuilder, not IBuilder<TBuilder>. I point this out mostly because the latter is what I originally had. After all, it’s often best to expose fairly general return types. That works fine while you’re only using operations within the interface, but often clients know more detail about the concrete type and want to use that information. For instance, you might want to be able to write:

Person.Builder builder = …; // Get a builder from somewhere
builder = builder.Clear()
                 .SetFirstName(“Fred”)
                 .SetLastName(“Jones”)
                 .Clone();

Here SetFirstName() and SetLastName() aren’t members of the interface, but Clear() and Clone() are. We can mix and match like this (and finally reassign the builder variable) because the interface is as strongly typed as it is. Code which only knows about the interface can still do whatever it likes, because it knows that TBuilder implements IBuilder<TBuilder>. In particular, that means it’s fine for some of the interface to be implemented by an abstract class – in Protocol Buffers there can be quite a deep inheritance tree for messages and builders, and a lot of the methods (particularly the merging ones) can be written in terms of the others. (Yes, that suggests that an extension method might be appropriate – but leaving it in the interface allows for particular implementations to override the general one, which can be important for optimisation. There’s also the matter of making the whole thing play nicely for people who are still stuck with .NET 2.0 and Visual Studio 2005.)

A small diversion via Java

It’s interesting to note that while my C# port is larely a port of the Java code, there are significant differences around how generics are used. This is understandable given how different generics are in .NET and Java. (My preference being heavily towards the .NET side – but there are moments when Java has its advantages.) However, one aspect of Java which is used to great effect is covariant return types. In the Java protocol buffers, the Message and Builder interfaces aren’t generic at all. For instance, the equivalent of the earlier part of the builder interface is just this:

public interface Builder
{
    Builder clear();
    Builder clone();
    Builder clearField(FieldDescriptor field);
    Builder addRepeatedField(FieldDescriptor field, object value);
    Builder setUnknownFields(UnknownFieldSet unknownFields);
    Builder mergeUnknownFields(UnknownFieldSet unknownFields);
    Builder mergeFrom(ByteString data);
    Builder mergeFrom(CodedInputStream input);
    Builder mergeFrom(CodedInputStream input, ExtensionRegistry registry);
}

Does that mean we can’t chain operations together any more, mixing and matching “concrete-type-specific” methods (such as the setters for first name and last name) with the interface methods? Not at all – because where Person.Builder implements (say) clear() it can do it like this:

Person.Builder clear()
{
// Implementation
return this;
}

At that point, everything which knows at compile-time that it’s calling Person.Builder.clear() knows that it returns a Person.Builder – whereas code which only knows that it’s calling the interface method only knows that it will return some implementation of the interface. (Apologies for the naming here – it’s unfortunate from a clarity standpoint that both the interface and the implementation is called Builder, but I thought it would be worth being faithful to the real code on this point.)

It’s just about possible to do this in C# as well, with explicit interface implementation. Again, I went that way to start with – and it was a disaster. In the intermediate abstract classes I was having to cast to the interface sometimes, not cast at other times, declare new abstract protected methods of ClearImpl etc. It was simply awful. I’ve gone back to my school of thought which is that explicit interface implementation is handy when it’s absolutely required (or where you deliberately want to make it hard to call certain members), but should be largely avoided.

In fact, I do have a non-generic interface for both messages and builders, but where types would be involved I’ve renamed the methods to things like WeakClear and WeakBuild. These Weak* methods are only defined in terms of the non-generic interfaces, and are mostly used in cases where we really don’t know at compile time what kind of message we’re dealing with, even in a generic sense. Life would, however, be much simpler if only we had covariant return types in C#.

Conclusion

Self-referential generic types shouldn’t be used more widely than they really need to be – they can be tough to get your head round. However, they can be useful when you want to maintain a strongly typed API which needs to talk in terms of itself. One redeeming feature of the complexity in Protocol Buffers is that most of it is in the implementation: users of Person and Person.Builder really don’t need to know or care about the interfaces for most of the time. So long as they use a strongly typed expression to start with, they’ll keep that strong typing and be presented with appropriate members to call as if the interfaces and intermediate abstract classes didn’t even exist. It’s an API which gets out of your way when you’re not interested in it, which is always a nice sign.

While trying a number of schemes I’ve learned that there can often be a lot of subtly different options available, and their benefits and drawbacks aren’t always obvious until you try them. Oh, and covariant return types would be very welcome, and explicit interface implementation should generally be avoided where possible :)

Next time I’ll reveal a bit more about the real interfaces in my PB port. Bear in mind that messages need to know about their builders, and vice versa…

C#, CSharpDev, CSharpDevCenter, Protocol Buffers

Lessons learned from Protocol Buffers, part 1: messages, builders and immutability

August 20, 2008 jonskeet 10 Comments

My port of the Protocol Buffers project has proved pretty interesting. I thought I’d share some of the lessons I’ve learned along the way, as well as some of the frustrations at concepts I still can’t express in C#.

This was originally all going to be in one post, but I’m becoming acutely aware of how long some posts can grow. I don’t know about you, but I find very long blog posts quite intimidating, so I’ve decided to split them up into individual topics. You’ll still probably need to read the posts in order to understand them though – and this introductory post is the most important one in that respect.

Messages and Builders

The Protocol Buffers project (or PB for short) is basically another serialization technology, putting emphasis on efficiency, platform neutrality, and backward/forward compatibility. The normal set of steps in using PB is something like this:

Write a .proto file describing your data in terms of messages.
Run protoc to generate C# (and Java/C++ if you so wish).
In your application, use the builder associated with the message type to create an instance of a message.
Serialize the data to a stream.
At some other point in the application (or a different app) deserialize the data.

The idea is that builders are mutable, while the messages they build are immutable. You can use builders either with Set* methods which return the same builder again, or properties which can be used within object initializers. For example:

// Syntax available in C# 2
Person john = new Person.Builder()
    .SetFirstName(“John”)
    .SetLastName(“Doe”)
    .Build();

// Using an object initializer
Person jane = new Person.Builder
{ FirstName=“Jane”, LastName=“Doe” }
.Build();

Of course, you don’t have to do all the building in one expression, it’s just a handy option in many cases.

As you can see, the builder is generated as a nested type of the message. That’s handy, as it means the builder has access to the private members of the message. To avoid lots of data copying we employ popsicle immutability – the builder directly manipulates the message until it’s built, at which point it makes sure that nothing will change it afterwards. If that makes you uncomfortable in terms of it not being “true” immutability, I sympathise – but I also give String as a counterexample; StringBuilder works in exactly this way, modifying a string directly until it exposes it to the outside world.

Other than the copying – and the fact that all the code exists explicitly, and the caller has to know about the builder – this is quite similar to the suggestion I made about C# immutability a while ago. One point which makes it all simpler is that every data type in Protocol Buffers is itself immutable – so we don’t need to worry about deep copies and the like.

Unfortunately the current implementation doesn’t support collection initializers – if you have a repeated field in your message, you have to call Add* to populate it. The Add* methods return the builder just like the Set* methods, so you can still do it all in one expression, but it’s not terribly neat. Using a collection initializer compiles, but fails at execution time because the properties for repeated fields always return immutable lists. This is by design, to stop callers from creating a builder, fetching the list property, calling Build and then adding to the list. A better solution (and one which I plan to implement soon) is to have a PopsicleList<T> which is initially mutable but which will become immutable at the appropriate time (i.e. when Build() is called). At that point we’ll be able to write:

Person jane = new Person.Builder
    { FirstName=“Jane”, LastName=“Doe”,
      Friends = { “Tom”, “Dick”, “Harry” } }
    .Build();

There’s quite a lot more to messages and builders than this – things like the reflection-like API to query properties of the message based on fields in the the message descriptor – but what I’ve described so far ought to be enough for most of what I want to talk about, most of which relates to generics. In the next part, I’ll talk about self-referential generic types.

Pre-Copenhagen interview

August 19, 2008 jonskeet Leave a comment

Brian Rasmussen has just posted an interview we did by email, as a sort of precursor to my talk in Copenhagen. It’s nice to occasionally write down “where I am” in terms of my feelings about Java, C# and my own career. There’s a bit of technical content, but it’s mostly stuff about me as a person, just to dampen expectations suitably.

I’m really, really looking forward to giving the talk now. Nearly two and a half months is a long time to wait…

Update: If you tried to get to the link earlier on and failed, try again – it’s back up.

Speaking in Copenhagen, October 30th

August 13, 2008 jonskeet 2 Comments

I should have announced this earlier, but I’m delighted to report that on October 30th I’ll be speaking at a C# event in Copenhagen. Brian Rasmussen has organised a one day seminar which basically consists of me talking about C# all day and fielding questions.

That sounds like more fun for me than anyone else, but apparently enough people disagree that the event is already fully booked. Still, if you want to sign up in case anyone drops out, the registration page has the details.

My plan is to make it a verbal edition of C# in Depth, with as much interaction as possible. I’ll try to tackle C# 2 in the morning and C# 3 in the afternoon, possibly with some fun using Push LINQ and Protocol Buffers at the end, just to show how flexible LINQ is (even in-process LINQ). However, I’m hoping that my agenda will be derailed by the audience asking lots of questions and leading me down interesting alleys. That’s usually been the way of things in the past, and it makes the whole experience much more fun.

If you’re coming, please mail me with the kind of topics you’d like covered. The more input I get, the better the event is likely to be. I’m looking forward to it a lot…

Visual Studio 2008 SP1 and .NET 3.5 SP1 both out now

August 11, 2008 jonskeet 2 Comments

I suspect this will be pretty widely advertised fairly quickly, but both Visual Studio 2008 SP1 and .NET 3.5 SP1 are available for download. Personally I’ve had problems signing into the MSDN subscriptions site and going to the downloads page, but the direct links work fine. Both are fairly small files which then download more stuff when you execute them. The .NET 3.5 SP1 download doesn’t require you to have .NET 3.5 installed beforehand.

Update (13th August): Patrick Smacchia has a great post showing the differences (in terms of numbers rather than features) between 3.5 and 3.5SP1.

C# refcard now available (free) from DZone

August 11, 2008 jonskeet 3 Comments

Just a quick announcement. I’ve been working on a C# “refcard” with DZone, and it’s now available. It’s free to download after registration, and covers (briefly):

String escape sequences
Delegate and event syntax
Nullable value types
Syntax for generics
Extension methods
Query expressions

Obviously this isn’t meant to be a comprehensive guide to C#, but I picked topics which have elements of syntax which are easily forgotten. Hopefully you’ll find it useful – and a refcard on “Core .NET” will be following fairly soon.

DZone also has a discount on Manning books, including C# in Depth.

Thank you, JetBrains: dotTrace rocks

August 9, 2008 jonskeet 3 Comments

One thing I failed to mention in my post about making reflection perform better is how I optimised the rest of the code. It was always pretty obvious that the reflection side would start off as a bottleneck – but for the rest of the code, I’ve relied heavily on dotTrace. It’s from JetBrains, the same people who make ReSharper (without which I’d be considerably more frustrated with Visual Studio).

While there are certainly elements of dotTrace which I haven’t explored (and occasionally some results which have mystified me – such as a claim that the CPU spent 111 seconds in one particular method, when the whole trace was only 99 seconds long – I suspect some double-counting of recursion) it’s been really useful, and incredibly easy to get started with.

JetBrains gives MVPs free licences for both ReSharper and dotTrace, which obviously makes me predisposed towards warm, fuzzy feelings for them. Both of them are highly recommended though, and have saved me a lot of time.

C#, CSharpDev

Making reflection fly and exploring delegates

August 9, 2008 jonskeet 42 Comments

Background

I’ve recently been doing some optimisation work which has proved quite interesting in terms of working with reflection. My efforts porting Google’s Protocol Buffers are now pretty complete in terms of functionality, so I’ve been looking at improving the performance. The basic idea is that you specify your data types in a .proto file, and then generate C# from that. The generated C# allows you to manipulate the data, and serialize/deserialize it. When you generate the code, it can be optimised either for size or speed. The “small” code can end up being much smaller than the “fast” code – but it’s also significantly slower as it uses reflection when serializing and deserializing. My first rough-and-ready benchmark results (using a 130K data file based on Northwind) were slightly terrifying:

Operation	Time (ms)
Deserialize (fast)	5.18
Serialize (fast)	3.96
Deserialize (slow)	429.49
Serialize (slow)	103.67

Far from all of this difference was due to reflection, but it was a significant chunk – and provided the most interesting and challenging optimisation. This post doesn’t show the actual Protocol Buffer code, but demonstrates the three steps I required to radically improve the performance of reflection. The examples I’ve used are chosen just for simplicity.

Converting MethodInfo into a delegate instance

There are lots of things you can do with reflection, obviously – but I’m primarily interested in calling methods, using the associated MethodInfo. This includes setting properties, using the results of the GetGetMethod and GetSetMethod methods of PropertyInfo. We’ll use String.IndexOf(char) as our initial example.

Normally when you’re calling methods with reflection, you call MethodInfo.Invoke. Unfortunately, this proves to be quite slow. If you know the signature of the method at compile-time, you can convert the method into a delegate with that signature using Delegate.CreateDelegate(Type, object, MethodInfo). You simply pass in the delegate type you want to create an instance of, the target of the call (i.e. what the method will be called
on), and the method you want to call. It would be nice if there were a generic version of this call to avoid casting the result, but never mind. Here’s a complete example demonstrating how it works:

using System;
using System.Reflection;
public class Test
{
    static void Main()
    {
        MethodInfo method = typeof(string).GetMethod("IndexOf", new Type[] { typeof(char) });

        Func<char, int> converted = (Func<char, int>)
            Delegate.CreateDelegate(typeof(Func<char, int>), "Hello", method);

        Console.WriteLine(converted('l'));
        Console.WriteLine(converted('o'));
        Console.WriteLine(converted('x'));
    }
}

This prints out 2, 4, and -1; exactly what we’d get if we’d called "Hello".IndexOf(...) directly. Now let’s see what the speed differences are…

We’re mostly interested in the time taken to go from the main calling code to the method being called, whether that’s with a direct method call, MethodInfo.Invoke or the delegate. To make IndexOf itself take as little time as possible, I tested it by passing in ‘H’ so it would return 0 immediately. As normal, the test was rough and ready, but here are the results:

Invocation type	Stopwatch ticks per invocation
Direct	0.18
Reflection	120
Delegate	0.20

One important point is that I created a new parameter array for each invocation of the MethodInfo – obviously this is slightly costly in itself, but it mirrors real world usage. The exact numbers don’t matter, but the relative sizes are the important point: using a delegate invocation is only about 10% slower than direct invocation, whereas using reflection takes over 600 times as long. Of course these figures will depend on the method being called – if the direct invocation can be inlined, I’d expect that to make a significant difference in some cases. However, the benefit in converting reflection calls into delegate calls is obvious.

Now, what about if we wanted to vary the string we were calling IndexOf on?

Interlude: open and closed delegates

When you create a delegate directly in C# using a method group conversion, you (almost) always create an open delegate for static methods and a closed delegate for instance methods. To explain the difference between open and closed delegates, it’s best to start thinking of all methods as being static – but with instance methods having an extra parameter at the start to represent this. In fact, extension methods use exactly this model. Reality is more complicated than that due to polymorphism, but we’ll leave that to one side for the moment.

Going back to our String.IndexOf example, we can start thinking of the signature as being:

static int IndexOf(string target, char c)

At this point it’s easy to explain the difference between open and closed delegates: a closed delegate has a value which it implicitly passes in as the first argument, whereas with an open delegate you specify all the arguments when you invoke the delegate. The implicit first argument is represented by the Delegate.Target property. It’s null for open delegates – which is usually the case when you create a delegate directly in C#. Here’s a short program to demonstrate the difference when you create delegate instances using C# directly:

using System;
public class Test
{
    readonly string name;

    public Test(string name)
    {
        this.name = name;
    }

    public void Display()
    {
        Console.WriteLine("Test; name = {0}", name);
    }

    static void StaticMethod()
    {
        Console.WriteLine("Static method");
    }

    static void Main()
    {
        Test foo = new Test("foo");

        Action closed = foo.Display; // closed.Target == foo
        Action open = StaticMethod;  // open.Target == null

        closed();
        open();
    }
}

Before we go back to reflection, I’ll clarify the “almost” I used earlier on. You can’t currently create an open delegate referring to an instance method in C# using method group conversions – but you can create a closed delegate referring to a static method, if it’s an extension method. This makes sense, as extension methods are a strange sort of half-way house between static methods and instance methods – they’re truly static methods which can be used as if they were instance methods. I’ve got an example on my C# in Depth site.

Creating open delegates with reflection

Even though C# doesn’t support all the possible combinations of static/instance methods and open/closed delegates directly, Delegate.CreateDelegate has overloads to let you do just that. The signature we used earlier (with parameters Type, object, MethodInfo) always creates a closed delegate. There’s another overload without the middle parameter – and that always creates an open delegate. We can easily modify our earlier example to let us call String.IndexOf(char) varying both the needle and the haystack, so to speak:

using System;
using System.Reflection;
public class Test
{
    static void Main()
    {
        MethodInfo method = typeof(string).GetMethod("IndexOf", new Type[] { typeof(char) });

        Func<string, char, int> converted = (Func<string, char, int>)
            Delegate.CreateDelegate(typeof(Func<string, char, int>), method);

        Console.WriteLine(converted("Hello", 'l'));
        Console.WriteLine(converted("Jon", 'o'));
        Console.WriteLine(converted("Hello", 'n'));
    }
}

This prints 2, 1, -1, as if we’d called "Hello".IndexOf('l'), "Jon".IndexOf('o') and "Hello".IndexOf('n').

This can be a very powerful tool – in particular it’s crucial for my Protocol Buffers port: for a particular type, I can create a delegate which will set a property. I can keep that information around forever, and use the same delegate to set the property to different values on different instances of the type.

There’s just one more problem to overcome – and unfortunately this is where things get a little weird.

Adapting delegates for parameter and return types

Due to the way that the Protocol Buffer library works, I often need to call methods or set properties without knowing at compile-time what the parameter types are, or indeed the return type of the method. I can be confident that I’ll always call it with appropriate parameters, but I just don’t know what they’ll be ahead of time. Things are slightly better in terms of the type declaring the method – I know that at compile-time, although only as a generic type parameter. What I do know with confidence is the number of parameters (I’ll just specify a single parameter for our example), and whether or not the method will return a value (we’ll use an example which always returns a parameter).

What I need is a generic method which has a type parameter T representing the type which implements the method, and which returns a Func – a delegate instance which lets me pass the target and the argument value, and which will call the method and then return the value in a weakly typed manner. So we’d like this kind of program to work:

using System;
using System.Reflection;
using System.Text;
public class Test
{
    static void Main()
    {
        MethodInfo indexOf = typeof(string).GetMethod("IndexOf", new Type[] { typeof(char) });
        MethodInfo getByteCount = typeof(Encoding).GetMethod("GetByteCount", new Type[] { typeof(string) });

        Func<string, object, object> indexOfFunc = MagicMethod<string>(indexOf);
        Func<Encoding, object, object> getByteCountFunc = MagicMethod<Encoding>(getByteCount);

        Console.WriteLine(indexOfFunc("Hello", 'e'));
        Console.WriteLine(getByteCountFunc(Encoding.UTF8, "Euro sign: \u20ac"));
    }

    static Func<T, object, object> MagicMethod<T>(MethodInfo method)
    {
        // TODO: Implement this method!
        throw new NotImplementedException();
    }
}

Note: I was going to demonstrate this by calling DateTime.AddDays, but for value type instance methods the implicit first first parameter is passed by reference, so we’d need a delegate type with a signature of DateTime Foo(ref DateTime original, double days) to call CreateDelegate. It’s feasible, but a bit of a faff. In particular, you can’t use Func as that doesn’t have any
by-reference parameters.

Make sure you understand what we’re aiming for here. Notice that we’re not really type-safe – just like we wouldn’t be if we were calling MethodInfo.Invoke. Of course we’d normally want type safety, but in this case it would make the calling code much more complicated, and in some places it might effectively be impossible. So, with the goal in place, we know we need to implement MagicMethod. (It’s not called MagicMethod in the real source code, of course – but frankly it’s quite a tricky method to name sensibly, and at this stage it really does feel like magic.)

The first obvious attempt at implementing MagicMethod would be to use CreateDelegate as we’ve done before, like this:

// Warning: doesn't actually work!
static Func<T, object, object> MagicMethod<T>(MethodInfo method)
{
    return (Func<T, object, object>)
        Delegate.CreateDelegate(typeof(Func<T, object, object>), method);
}

Unfortunately, that fails – the call to CreateDelegate fails with an ArgumentException because the delegate type isn’t right for the method that we’re trying to call. The delegate types don’t have to be exactly right, just compatible (as of .NET 2.0) – but we need an explicit conversion from object to the right parameter type, and a potentially boxing conversion of the return value. We still want to call CreateDelegate though… so somewhere we’re going to have to create a Func where TTarget is a type parameter representing the type of object we’re going to call the method on, TParam is the type of the single parameter the method accepts, and TReturn is the return type of the method.

We could do that directly with reflection, using typeof(Func) to get the open type (not to be confused with an open delegate!), then calling Type.MakeGenericType to create the right constructed type. We’ll need to do something like that anyway, but it’s actually easier to write another generic method with the right type parameters for this part. That will let us convert the MethodInfo into a delegate, but then what are we going to do with it? How can we convert a Func into a Func? Well, we need to cast the parameter from object to TParam, and then convert the result from TReturn to object, which may involve boxing. If we were writing a method to do this, it would look something like this:

static object CallAndConvert<TTarget, TParam, TReturn>
    (Func<TTarget, TParam, TReturn> func, TTarget target, object param)
{
    // Conversion from TReturn to object is implicit
    return func(target, (TParam) param);
}

We don’t want to execute that code at the moment – we want to create a delegate which will execute it later. The easiest way to do that is to move the code into a lambda expression within a normal method which already has a reference to the Func. That lambda expression will then be converted into a delegate of the type we really want. It may feel like we’re just adding layer upon layer of indirection (and indeed we are) but we’re genuinely making progress. Honest. Here’s the new generic method:

static Func<TTarget, object, object> MagicMethodHelper<TTarget, TParam, TReturn>(MethodInfo method)
{
    // Convert the slow MethodInfo into a fast, strongly typed, open delegate
    Func<TTarget, TParam, TReturn> func = (Func<TTarget, TParam, TReturn>)
        Delegate.CreateDelegate(typeof(Func<TTarget, TParam, TReturn>), method);
    // Now create a more weakly typed delegate which will call the strongly typed one
    Func<TTarget, object, object> ret = (TTarget target, object param) => func(target, (TParam) param);
    return ret;
}

(We could return the lambda expression directly – the ret variable is only present as an attempt to add some clarity.)

We’re now just one step away from having a working program – we need to implement MagicMethod by calling MagicMethodHelper. There’s one obvious problem though – we need three type arguments to call MagicMethodHelper, and we’ve only got one of them in MagicMethod. We know the other two at execution time, based on the parameter type and return type of the MethodInfo we’ve been
passed. The fact that we only know them at execution time suggests the next step – we need to use reflection to invoke MagicMethodHelper. We need to fetch the generic method and then supply the type arguments. It’s easier to show this than to describe it:

static Func<T, object, object> MagicMethod<T>(MethodInfo method) where T : class
{
    // First fetch the generic form
    MethodInfo genericHelper = typeof(Test).GetMethod(
        "MagicMethodHelper", BindingFlags.Static | BindingFlags.NonPublic);
    // Now supply the type arguments
    MethodInfo constructedHelper = genericHelper.MakeGenericMethod(
        typeof(T), method.GetParameters()[0].ParameterType, method.ReturnType);

    // Now call it. The null argument is because it’s a static method.
    object ret = constructedHelper.Invoke(null, new object[] { method });

    // Cast the result to the right kind of delegate and return it
    return (Func<T, object, object>) ret;
}

I’ve added the where T : class constraint to make sure (at compile-time) that we don’t run into the problem I mentioned earlier around calling value type methods. It may seem slightly odd that we’re using reflection to call MagicMethodHelper when the whole point of the exercise was to avoid invoking methods by reflection – but we only need to invoke the method once, and we can use the returned delegate many times. Here’s the complete program, ready to compile and run:

using System;
using System.Reflection;
using System.Text;

public class Test
{
    static void Main()
    {
        MethodInfo indexOf = typeof(string).GetMethod("IndexOf", new Type[]{typeof(char)});
        MethodInfo getByteCount = typeof(Encoding).GetMethod("GetByteCount", new Type[]{typeof(string)});

        Func<string, object, object> indexOfFunc = MagicMethod<string>(indexOf);
        Func<Encoding, object, object> getByteCountFunc = MagicMethod<Encoding>(getByteCount);

        Console.WriteLine(indexOfFunc("Hello", 'e'));
        Console.WriteLine(getByteCountFunc(Encoding.UTF8, "Euro sign: \u20ac"));
    }

    static Func<T, object, object> MagicMethod<T>(MethodInfo method) where T : class
    {
        // First fetch the generic form
        MethodInfo genericHelper = typeof(Test).GetMethod("MagicMethodHelper", 
            BindingFlags.Static | BindingFlags.NonPublic);

        // Now supply the type arguments
        MethodInfo constructedHelper = genericHelper.MakeGenericMethod
            (typeof(T), method.GetParameters()[0].ParameterType, method.ReturnType);

        // Now call it. The null argument is because it's a static method.
        object ret = constructedHelper.Invoke(null, new object[] {method});

        // Cast the result to the right kind of delegate and return it
        return (Func<T, object, object>) ret;
    }    

    static Func<TTarget, object, object> MagicMethodHelper<TTarget, TParam, TReturn>(MethodInfo method)
        where TTarget : class
    {
        // Convert the slow MethodInfo into a fast, strongly typed, open delegate
        Func<TTarget, TParam, TReturn> func = (Func<TTarget, TParam, TReturn>)Delegate.CreateDelegate
            (typeof(Func<TTarget, TParam, TReturn>), method);

        // Now create a more weakly typed delegate which will call the strongly typed one
        Func<TTarget, object, object> ret = (TTarget target, object param) => func(target, (TParam) param);
        return ret;
    }
}

Conclusion

This isn’t the kind of thing which I enjoy having in production code. It’s frightfully complicated – we’re finding a method via reflection, invoking a different (and generic) method via reflection in order to turn the first method into a delegate and then return a different delegate which calls it. While I don’t like having “clever” code like this in production, I take immense pleasure from getting it to work in the first place. This is one of the rare occasions where the result makes all the cleverness worth it, too – combined with the other optimisations, my Protocol Buffers port is now much, much faster – the reflection invocations are no longer a bottleneck. (We lose a little bit of efficiency by having one delegate call another, but it’s still massively quicker than using reflection.)

Regardless of the complexity involved later on, the simpler parts of this post (calling Delegate.CreateDelegate where you already know the signature, and the possibility of creating open delegates) are likely to be more widely applicable. By using a delegate instead of MethodInfo, not only are there significant performance improvements, but also a strongly typed way of calling the method. From now on, I’ll certainly be considering whether or not it might be worth using a delegate any time I use reflection.

C#, CSharpDev, CSharpDevCenter

Making the most of generic type inference

August 6, 2008 jonskeet 2 Comments

Introduction

Specifying type arguments for generic types and methods can be a pain, especially when there are multiple type parameters involved. For instance, imagine having to explicitly specify TOuter, TInner, TKey and TResult for a call to Enumerable.Join! Fortunately the compiler can work out the type arguments most of the time – but only for generic methods. It doesn’t do anything for generic types. However, all is not lost…

Overloading type names by number of type parameters

One feature of C# and .NET which isn’t used terribly often (in my experience) is the ability to use the same type name for different types – so long as they have different numbers of type parameters. This is how System.Nullable and System.Nullable<T> coexist, for example. Like all language features this is open to massive abuse if you have several types with very different semantic meanings, but when applied with discretion it can be very helpful.

We can use this to our advantage when we want to call a constructor or static method of a generic type without specifying the type parameters. The basic idea is that you create a nongeneric type (or just one with fewer type parameters) and then put a generic method in that class. The generic method in the nongeneric class then calls a member in the generic class, using the method’s type parameters as the type arguments for the generic class. (Try getting all of that right after a few drinks!) Now that you’ve got a generic method, you can use type inference to avoid having to explicitly state the type arguments. Fortunately it’s a lot simpler than it sounds…

To show you what I mean, let’s look at a bit of code from my MiscUtil library. (This code isn’t in the latest release drop, but will be in the next one – and this post provides all the important code anyway.)

Projection comparisons

I’ve found the OrderBy method in LINQ very useful, and I wanted to be able to use the same “compare using a projection” idea elsewhere. The IComparer<T> interface is used in various places in the .NET API (List<T>.Sort being an obvious example) but implementing it can be a bit tedious – even though it’s a single method. So, let’s build a ProjectionComparer type which knows how to compare two objects by applying the same projection to both of them, and then using another comparer to compare the results.

There are two types involved – the source of the projection, and the key we’re projecting it to. This naturally suggests a type with two type parameters, TSource and TKey. For instance, when projecting from a Person type to their name, we might have TSource=Person and TKey=string.

The most obvious piece of information we need to create a projection comparer is the projection itself. A delegate is the obvious way of representing this – a Func<TSource, TTarget> which can be applied to each item we try to compare. We then need to know how to compare the names (e.g. case-insensitive, ordinal etc) – functionality which is provided by StringComparer in this example, and IComparer<TKey> in general. The Comparer<T>.Default property comes in handy to let us get away without specifying a comparer in many situations.

With those few design decisions, we can implement ProjectionComparer<TSource, TKey> pretty simply:

public class ProjectionComparer<TSource, TKey> : IComparer<TSource>
{
private readonly Func<TSource, TKey> projection;
private readonly IComparer<TKey> comparer;

    public ProjectionComparer(Func<TSource, TKey> projection)
        : this (projection, null)
    {
    }

    public ProjectionComparer(Func<TSource, TKey> projection, IComparer<TKey> comparer)
    {
        if (projection==null)
        {
            throw new ArgumentNullException(“projection”);
        }
        this.comparer = comparer ?? Comparer<TKey>.Default;
        this.projection = projection;
    }

    public int Compare(TSource x, TSource y)
    {
        // Don’t want to project from nullity
        if (x==null && y==null)
        {
            return 0;
        }
        if (x==null)
        {
            return -1;
        }
        if (y==null)
        {
            return 1;
        }
        return comparer.Compare(projection(x), projection(y));
    }
}

That’s functionally complete, but it’s a bit of a pain to create instances of it. Our previous example would require something like this:

var nameComparer = new ProjectionComparer<Person, string>(person => person.Name);

It’s not bad, but we can do better.

Introducing the nongeneric ProjectionComparer type

The next step is almost as simple as imagining how we want to create instances. We don’t have to use a nongeneric type with the same name as the generic type, but it keeps things consistent, and forms a simple pattern to follow at other times. So, let’s imagine being able to write this:

var nameComparer = ProjectionComparer.Create(person => person.Name);

Unfortunately we can’t quite achieve that. There’s no way for the compiler to know the type of the parameter in the lambda expression. However, we have three options we can use:

// Explicitly type the lambda expression’s parameter
var option1 = ProjectionComparer.Create((Person person) => person.Name);

// Pass in a dummy parameter of the right type
var option2 = ProjectionComparer.Create(dummyPerson, person => person.Name);

// Use a class with one generic type parameter, and infer the other
var option3 = ProjectionComparer<Person>.Create(person => person.Name);

Each of these options is just a way of telling the compiler what TSource should be. The first two are implemented in a totally nongeneric class. The third is implemented in a generic class with a type parameter for TSource but letting the compiler infer TKey. Note that we have to make this split because you can’t explicitly specify some type arguments and let the compiler infer the others. The actual code for these methods is very straightforward indeed. I haven’t included overloads where the comparer is explicitly specified, but it’s very simple to do so if required.

public static class ProjectionComparer
{
    // For option 1
    public static ProjectionComparer<TSource, TKey> Create<TSource, TKey>(Func<TSource, TKey> projection)
    {
        return new ProjectionComparer<TSource, TKey>(projection);
    }

    // For option 2
    public static ProjectionComparer<TSource, TKey> Create<TSource, TKey>(TSource ignored, Func<TSource, TKey> projection)
    {
        return new ProjectionComparer<TSource, TKey>(projection);
    }

}

// For option 3
public static class ProjectionComparer<TSource>
{
    public static ProjectionComparer<TSource, TKey> Create<TKey>(Func<TSource, TKey> projection)
    {
        return new ProjectionComparer<TSource, TKey>(projection);
    }
}

Conclusion

There’s nothing particularly difficult in this post, but it’s sometimes easy to forget that the C# compiler can help you out when it comes to filling in type arguments. Of course it only helps when you already providing enough information to the compiler with normal method parameters, but it’s still a nice little trick to have up your sleeve when you’re trying to make your APIs that bit more pleasant to use.