As with almost any performance work, your mileage may vary (in particular the 64-bit JIT may work differently) and you almost certainly shouldn’t care. Relatively few people write production code which is worth micro-optimizing. Please don’t take this post as an invitation to make code more complicated for the sake of irrelevant and possibly mythical performance changes.
It took me a surprisingly long time to find the problem described in the previous blog post, and almost no time at all to fix it. I understood why it was happening. This next problem took a while to identify at all, but even when I’d found a workaround I had no idea why it worked. Furthermore, I couldn’t reproduce it in a test case… because I was looking for the wrong set of triggers. I’ve now found at least some of the problem though.
This time the situation in Noda Time is harder to describe, although it concerns the same area of code. In various places I need to create new delegates containing parsing steps and add them to the list of steps required for a full parse. I can always use lambda expressions, but in many cases I’ve got the same logic repeatedly… so I decided to pull it out into a method. Bang – suddenly the code runs far slower. (In reality, I’d performed this refactoring first, and "unrefactored" it to speed things up.)
I think the problem comes down to method group conversions with generic methods and a type argument which is a reference type. The CLR isn’t very good at them, and the C# compiler uses them more than it needs to.
Show me the benchmark!
The complete benchmark code is available of course, but fundamentally I’m doing the same thing in each test case: creating a delegate of type Action which does nothing, and then checking that the delegate reference is non-null (just to avoid the JIT optimizing it away). In each case this is done in a generic method with a single type parameter. I call each method in two ways: once with int as the type argument, and once with string as the type argument. Here are the different cases involved:
- Use a lambda expression: Action foo = () => {};
- Fake what I expected the compiler to do: keep a separate generic cache class with a static variable for the delegate; populate the cache once if necessary, and thereafter use the cache field
- Fake what the compiler is actually doing with the lambda expression: write a separate generic method and perform a method group conversion to it
- Do what the compiler could do: write a separate non-generic method and perform a method group conversion to it
- Use a method group conversion to a static (non-generic) method on a generic type
- Use a method group conversion to an instance (non-generic) method on a generic type, via a generic cache class with a single field in referring to an instance of the generic class
(Yes, the last one is a bit convoluted – but the line in the method itself is simple: Action foo = ClassHolder<T>.SampleInstance.NoOpInstance;
Remember, we’re doing each of these in a generic method, and calling that generic method using a type argument of either int or string. (I’ve run a few tests, and the exact type isn’t important – all that matters is that int is a value type, and string is a reference type.)
Importantly, we’re not capturing any variables, and the type parameter is not involved in either the delegate type or any part of the implementation body.
Benchmark results
Again, times are in milliseconds – but this time I didn’t want to run it for 100 million iterations, as the "slow" versions would have taken far too long. I’ve run this on the x64 JIT as well and seen the same effect, but I haven’t included the figures here.
Times in milliseconds for 10 million iterations
Test | TestCase<int> | TestCase<string> |
Lambda expression | 180 | 29684 |
Generic cache class | 90 | 288 |
Generic method group conversion | 184 | 30017 |
Non-generic method group conversion | 178 | 189 |
Static method on generic type | 180 | 29276 |
Instance method on generic type | 202 | 299 |
Yes, it’s about 150 times slower to create a delegate from a generic method with a reference type as the type argument than with a value type… and yet this is the first I’ve heard of this. (I wouldn’t be surprised if there were a post from the CLR team about it somewhere, but I don’t think it’s common knowledge by any means.)
Conclusion
One of the tricky things is that it’s hard to know exactly what the C# compiler is going to do with any given lambda expression. In fact, the method which was causing me grief earlier on isn’t generic, but it’s in a generic type and captures some variables which use the type parameters – so perhaps that’s causing a generic method group conversion somewhere along the way.
Noda Time is a relatively extreme case, but if you’re using delegates in any performance-critical spots, you should really be aware of this issue. I’m going to ping Microsoft (first informally, and then via a Connect report if that would be deemed useful) to see if there’s an awareness of this internally as potential "gotcha", and whether there’s anything that can be done. Normal trade-offs of work required vs benefit apply, of course. It’s possible that this really is an edge case… but with lambdas flying everywhere these days, I’m not sure that it is.
Maybe tomorrow I’ll actually be able to finish getting Noda Time moved onto the new system… all of this performance work has been a fun if surprising distraction from the main job of shipping working code…