All posts by jonskeet

Mad props to @arcaderage for the "Princess Rescue" image - see https://toggl.com/programming-princess for the full original

Initial thoughts on C# 5’s async support

As many of you will have seen yesterday, the major new feature of C# 5 was announced yesterday: support for asynchronous operations with new contextual keywords of "async" and "await". I haven’t had a lot of time to play with them yet, but I have a few initial thoughts. Note that these won’t make any sense to you unless you’ve also watched the talk Anders gave, or read Eric’s blog post, or something similar. Links are at the bottom of this post rather than peppered throughout.

Asynchronous != parallel

Do you remember when LINQ was first announced, and there was a huge fear that it was effectively putting SQL into C#, when in fact the truth was more general? Well, it’s the same here. The language feature allows a method to execute asynchronously, but it won’t create new threads and the like automatically.

I suspect there’s going to be a lot of confusion about which parts of the asynchronous model are provided by the language and which by the framework. Obviously I’ll hope to clarify things where I can, but it’s important to understand that the language isn’t going to automatically start extra threads or anything like that.

In the Channel9 PDC interview, Anders stressed this point pretty hard – which suggests he thinks it’s going to cause confusion too. I became a lot more comfortable with what’s going on after reading the draft spec – which is part of the CTP.

Language and framework interaction

It looks like the language changes have been designed to be pattern-based, a little like LINQ and foreach. The async modifier is present more for the sake of developers than the compiler, and the await contextual keyword simply requires a GetAwaiter() method to be available (which can be an extension method); that method has to return something for which BeginAwait(Action) and EndAwait() are valid – and again, these can be extension methods.

One of the tests I performed yesterday was to try to use this myself, with a custom "awaiter factory" – it worked fine, and it’s definitely an interesting way of exploring what’s going on. Expect a blog post along those lines some time next week.

So far, so framework-neutral… the only slight fly in the ointment is the reliance on Task and Task<T>. Just as an iterator block with yield return 10; would usually be declared to return IEnumerable<int>, so an async method ending in return 10; would have a return type of Task<int>. I don’t think that’s actually too bad – but I’ll have to dig into exactly what the language relies on to be completely comfortable with. Are there any aspects of tasks which are usually configured by their factories, for example? If so, what does the compiler do?

My gut feeling is that the language is still keeping the framework at an appropriate distance as far as it can, but that it’s got to have something like Task<T> as a way of representing an asynchronous operation.

But is it any use?

Again, I’m really going on gut feelings so far. I think it’s going to be very useful – and I wish I had something similar in Java when writing internal services at work. I like the fact that it’s relatively simple – it feels much simpler than dynamic typing, for example – so I think I’ll be able to get my head round it.

It’s important to note that it’s not a free lunch, and doesn’t try to be. It removes much of the error-prone mechanical drudgery of writing asynchronous code, but it doesn’t attempt to magically parallelize everything you do. You still need to think about what makes sense to do asynchronously, and how to introduce parallelism where appropriate. That’s a really good thing, in my view: it’s about managing complexity rather than hiding it.

Conclusion

I think C# 5 is going to be a lot of fun… but I also think you’re going to need to understand the Task Parallel Library to really make the most of it. Keep the two separate in your head, and understand how they complement each other. If ever there was a good time to learn TPL thoroughly, it’s now. Learning the C# 5 features themselves won’t take long. Mastering the combination of the two in a practical way could be a much bigger task.

Links

Obviously there are lots of resources flooding out of Microsoft (and other bloggers) at the moment. Here are a few to start with:

  • The Visual Studio async home – probably the best site to bookmark, as I’m sure it’ll get updated over time. Download the CTP from here, as well as watching various videos.
  • PDC 2010 player – as this was announced at PDC, there are lots of talks there, including the session where Anders introduces the feature.
  • Eric Lippert’s first blog post on async – with more to come, of course. Follow the Async or C# 5.0 tags.
  • A blog post by Michael Crump with a CTP walkthrough

Overloading and generic constraints

A recent Stack Overflow question piqued my interest. It looked like some inappropriate methods were being considered in overload resolution, when I thought they should have been rejected earlier. Yet again, I thought I’d found a compiler bug, but it turned out to be a case of misreading the spec. I’m arrogant enough to think that if I misread the spec, someone else might too, so I thought I’d dig into it a bit.

It turns out that Eric Lippert has written about the same area of the spec, but in the more complicated scenario where type inference is involved – in other words, where the compiler is working out what generic type arguments should be supplied to a method. While that’s certainly interesting, type inference introduces a whole extra bit of the spec which I prefer to avoid for reasons of health. We can still explore the same key process without involving type inference. I’m effectively revisiting one small part of Eric’s blog post – as much to make sure that I understand it as much as anything else.

The relevant area of the C# 4 spec is section 7.6.5.1, "Method invocations". Broadly speaking, this defines the process for choosing which method to invoke as:

  1. Find a set of candidate methods
  2. Remove methods which aren’t in the most derived type. This is one of the strange areas of overloading which can easily cause surprise. I’ve talked about it elsewhere, particularly in my article on overloading.
  3. If we haven’t found anything else, resort to extension methods
  4. Pick the best method out of the set
  5. Validate that the best method is actually sane

Let’s look at this in three different contexts.

Example 1: Ambiguity

Now, for the purposes of this post I’m only interested in where generic type parameter constraints are checked, and the effects of that on the overall process. Let’s look at a concrete example:

using System;

class Test
{
    static void Foo<T>(string x) where T : struct
    {}
    
    static void Foo<T>(Exception e) where T : class
    {}
    
    static void Main()
    { 
        Foo<object>(null);
    }
}

Okay, so we’ve got two methods called M with different constraints. I’ve given them both parameters of different reference types. Without the parameters, we couldn’t even declare the two methods, as they would have the same signature: the declarations only differ in generic constraints, and constraints aren’t part of the signature. Both parameter types are reference types, so the argument we’re passing (null) is valid for both parameters.

So, what happens when we try to call Foo<object>(null)? Intuitively we might expect it to call the second overload, with the Exception parameter – because that has a constraint on T which is satisfied by the type argument, whereas the first overload has a value type constraint on T, which clearly isn’t satisfied by object. But what happens in the spec?

Well, as per the list above, we first need to find the set of candidate methods. This is where I got confused. Working out the method group is easy – both methods are in it. We’ve supplied type arguments, and both methods are generic, so this bit of the spec applies:

If F is generic and M includes a type argument list, F is a candidate when:

  • F has the same number of method type parameters as were supplied in the type argument list, and
  • Once the type arguments are substituted for the corresponding method type parameters, all constructed types in the parameter list of F satisfy their constraints (§4.4.4), and the parameter list of F is applicable with respect to A (§7.5.3.1).

Note that here, "F" is the method under consideration, and "M" is the invocation expression.

The first bullet is fine: we have a single type argument, and both methods have a single type parameter. So what about the second? It talks about types satisfying constraints, so at first glance you might expect this to rule out the first overload. That’s certainly what I thought initially. But look carefully… it’s talking about the types in the parameter list – checking that they satisfy their constraints. It’s not talking about the type arguments for the method itself, or the constraints there.

Now our two parameters (string and Exception) are both nongeneric, so they can’t even have constraints. Both end up being candidate methods, and because neither is better than the other with respect to our invocation, the call ends up being ambiguous:

Test.cs(13,9): error CS0121: The call is ambiguous between the following methods or properties: ‘Test.Foo<object>(string)’ and ‘Test.Foo<object>(System.Exception)’

The next obvious question is what situation the constraint checking here does have an effect. Let’s look at a very slightly different example.

Example 2: Weeding out invalid constructed parameter types

This time I’m going to introduce an extra generic type. It’s called Factory, and TItem is constrained to have a parameterless constructor. I’ve called it TItem rather than just T for the sake of clarity later on. We’re not going to give it any behaviour, as we’re just interested in the type declaration. Here’s the complete code for the example:

using System; 

class Factory<TItem> where TItem : new()
{}
    
class Test 

    static void Foo<T>(Factory<T> factory) where T : struct
    {} 
     
    static void Foo<T>(Exception e) where T : class 
    {} 
     
    static void Main() 
    {  
        Foo<string>(null);
    }
}

Now, the first thing to note is that just to declare the first method with a parameter of Factory<T>, we have to ensure that T will have a parameterless constructor. That’s coverted by the constraint of T : struct, fortunately.

Now let’s look at the method invocation. This time we’re trying to use string as the type argument. The second method is obviously a candidate – it’s fine in every way. But what about the first?

After type argument substitution, the parameter list for the first method looks like this:

(Factory<string> factory)

Now the spec says that each type in the parameter list has to be valid with respect to its constraints. In this case, it isn’t: string doesn’t have a parameterless constructor, so the type Factory<string> is invalid.

So, the first method is rejected, the second is fine, so it compiles with no problems. The important thing to note is that the compiler hasn’t taken any notice of the where T : struct part of the first method’s declaration… it’s only rejecting the first method on the basis of the constraint on Factory<T>. We can demonstrate that by changing the method invocation:

Foo<object>(null);

Now object does have a parameterless constructor, so the method is still part of the candidate set, and we’re back to the same ambiguous position of our first example.

Now we haven’t yet managed to trip the compiler up on the final part of our list – validating the constraints of method after it’s been chosen.

Example 3: Final validation

I could demonstrate this with a single method, so that overload resolution had no choices to make. However, we’ll have a little bit more fun. Here’s our final sample code:

using System; 

class Test 

    static void Foo<T>(object x) where T : struct
    {} 
     
    static void Foo<T>(string y) where T : class 
    {} 
     
    static void Main() 
    {  
        Foo<int>(null);
    }
}

Gone is Factory<T> – we’re back to simple parameters. The big difference between this example and the first one, however, is that the parameters are object and string instead of string and Exception. That means that although both methods are in the candidate set, the second one is deemed "better" than the first because the conversion from null to string is better (more specific) than the conversion from null to object.

So, the compiler resolves everything about the method invocation without paying any attention to the generic constraints involved until the very last moment – where it notices that the chosen method (the second one) isn’t valid when T is int, and fails with this error:

Test.cs(13,9): error CS0452: The type ‘int’ must be a reference type in order to use it as parameter ‘T’ in the generic type or method ‘Test.Foo<T>(string)’

There’s no mention of ambiguity, because by this point there isn’t any ambiguity. The cruel irony is that if only the compiler had applied the constraints earlier, it would have found that calling the first method is perfectly valid in every way.

Conclusion

I’m not going to try to argue the pros and cons of the way the language has been designed – Eric has some thoughts on that in the blog post, and obviously he has much more insight into the design process than I do. However, next time you find yourself trying to understand why the compiler is behaving in a particular way, hopefully one of these examples may help you.

The curious case of the publicity-seeking interface and the shy abstract class

Noda Time has a guilty secret, and I’m not just talking about the fact that there’s been very little progress on it recently. (It’s not dead as a project – I have high hopes, when I can put some quality time into it.) This secret is called LocalInstant, and it’s a pain in the neck.

One of the nice things about giving talks about an API you’re currently writing is that you can see which concepts make sense to people, and which don’t – as well as seeing which concepts you’re able to explain and which you can’t. LocalInstant has been an awkward type to explain right from day 1, and I don’t think it’s improved much since then. For the purpose of this blog post, you don’t actually need to know what it means, but if you’re really interested, imagine that it’s like a time-zone-less date and time (such as "10:58 on July 2nd 2015" but also missing a calendar system, so you can’t really tell what the month is etc. The important point is that it’s not just time-zone-less, but it’s actually local – so it doesn’t represent a single instant in time. Unlike every other concept in Noda Time, I haven’t thought of any good analogy between LocalInstant and the real world.

Now, I don’t like having types I can’t describe easily, and I’d love to just get rid of it completely… but it’s actually an incredibly powerful concept to have in the library. Not for users of course, but for the implementation. It’s spattered all over the place. Okay, the next best step to removing it is to hide it away from consumers: let’s make it internal. Unfortunately, that doesn’t work either, because it’s referred to in interfaces all the time too. For example, almost every member of ICalendarSystem has LocalInstant as one of its parameters.

The rules around interfaces

Just to recap, every member of an interface – even an internal interface – is implicitly public. That causes some interesting restrictions. Firstly, every type referred to in a public interface must be public. So this would be invalid:

internal struct LocalInstant {}

// Doesn’t compile: Inconsistent accessibility
public interface ICalendarSystem

    LocalInstant GetLocalInstant(int year, int month, int day);
}

So far, so good. It’s entirely reasonable that a public member’s declaration shouldn’t refer to an internal type. Calling code wouldn’t understand what LocalInstant was, so how could it possibly use ICalendarSystem sensibly? But suppose we only wanted to declare the interface internally. That should be okay, right? Indeed, the compiler allows the following code:

internal struct LocalInstant {}

// Compiles with no problems
internal interface ICalendarSystem
{
    LocalInstant GetLocalInstant(int year, int month, int day);
}

But hang on… isn’t GetLocalInstant public? That’s what I said earlier, right? So we’re declaring a public member using an internal type… which we thought wasn’t allowed. Is this a compiler bug?

Well, no. My earlier claim that "a public member’s declaration shouldn’t refer to an internal type" isn’t nearly precise enough. The important aspect isn’t just whether the member is declared public – but its accessibility domain. In this case, the accessibility domain of ICalendarSystem.GetLocalInstant is only the assembly, which is why it’s a valid declaration.

However, life becomes fun when we try to implement ICalendarSystem in a public class. It’s perfectly valid for a public class to implement an internal interface, but we have some problems declaring the method implementing GetLocalInstant. We can’t make it a public method, because at that point its accessibility domain would be anything referring to the assembly, but the accessibility domain of LocalInstant itself would still only be the assembly. We can’t make it internal, because it’s implementing an interface member, which is public.

There is an alternative though: explicit interface implementation. That comes with all kinds of other interesting points, but it does at least compile:

internal struct LocalInstant {}

internal interface ICalendarSystem
{
    LocalInstant GetLocalInstant(int year, int month, int day);
}

public class GregorianCalendarSystem : ICalendarSystem
{
    // Has to be implemented explicitly
    LocalInstant ICalendarSystem.GetLocalInstant(int year, int month, int day);
    {
        // Implementation
    }
}

So, we’ve got somewhere at this point. We’ve managed to make a type used within an interface internal, but at the cost of making the interface itself internal, and requiring explicit interface implementation within any public classes implementing the interface.

That could potentially be useful in Noda Time, but it doesn’t solve our real LocalInstant / ICalendarSystem problem. We need ICalendarSystem to be public, because consumers need to be able to specify a calendar when they create an instance of ZonedDateTime or something similar. Interfaces are just too demanding in terms of publicity.

Fortunately, we have another option up our sleeves…

Abstract classes to the rescue!

I should come clean at this point and say that generally speaking, I’m an interface weenie. Whenever I need a reusable and testable abstraction, I reach for interfaces by default. I have a general bias against concrete inheritance, including abstract classes. I’m probably a little too harsh on them though… particularly as in this case they do everything I need them to.

In Noda Time, I definitely don’t need the ability to implement ICalendarSystem and derive from another concrete class… so making it a purely abstract class will be okay in those terms. Let’s see what happens when we try:

internal struct LocalInstant {} 

public abstract class CalendarSystem

    internal abstract LocalInstant GetLocalInstant(int year, int month, int day);

internal class GregorianCalendarSystem : CalendarSystem
{  
    internal override LocalInstant GetLocalInstant(int year, int month, int day)
    { 
        // Implementation
    } 
}

Hoorah! Now we’ve hidden away LocalInstant but left CalendarSystem public, just as we wanted to. We could make GregorianCalendarSystem public or not, as we felt like it. If we want to make any of CalendarSystem‘s abstract methods public, then we can do so provided they don’t require any internal types. There’s on interesting point though: types outside the assembly can’t derive from CalendarSystem. It’s a little bit as if the class only provided an internal constructor, but with a little bit more of an air of mystery… you can override every method you can actually see, and still get a compile-time error message like this:

OutsideCalendar.cs(1,14): error CS0534: ‘OutsideCalendar’ does not implement inherited abstract member
        ‘CalendarSystem.GetLocalInstant(int, int, int)’

I can just imagine the author of the other assembly thinking, "But I can’t even see that method! What is it? Where is it coming from?" Certainly a case where the documentation needs to be clear. Whereas it’s impossible to create an interface which is visible to the outside world but can’t be implemented externally, that’s precisely the situation we’ve reached here.

The abstract class is a little bit like an authentication token given by a single-sign-on system. From the outside, it’s an opaque item: you don’t know what’s in it or how it does its job… all you know is that you need to obtain it, and then you can use it to do other things. On the inside, it’s much richer – full of useful data and members.

Conclusion

Until recently, I hadn’t thought of using abstract classes like this. It would possibly be nice if we could use interfaces in the same way, effectively limiting the implementation to be in the declaring assembly, but letting the interface itself (and some members) be visible externally.

A bigger question is whether this is a good idea in terms of design anyway. If I do make LocalInstant internal, there will be a lot of interfaces which go the same way… or become completely internal. For example, the whole "fields" API of Noda Time could become an implementation detail, with suitable helper methods to fetch things like "how many days are there in the given month." The fields API is an elegant overall design, but it’s quite complicated considering the very limited situations in which most callers will use it.

I suspect I will try to go for this "reduced API" for v1, knowing that we can always make things more public later on… that way we give ourselves a bit more flexibility in terms of not having to get everything right first time within those APIs, too.

Part of me still feels uncomfortable with the level of hiding involved – I know other developers I respect deeply who hide as little as possible, for maximum flexibility – but I do like the idea of an API which is really simple to browse.

Aside from the concrete use case of Noda Time, this has proved an interesting exercise in terms of revisiting accessibility and the rules on what C# allows.

Book Review: Effective C# (2nd edition) by Bill Wagner

Resources:

Disclaimer

Just in case you’re unaware, I’m the author of another C# book, C# in Depth. Although Effective C# is somewhat different to my book, they certainly share a target audience. To that extent, Bill and I are competitors. I try hard to stay unbiased in reviews, but it’s probably impossible. Bear this in mind while reading. I should also note that I didn’t buy my copy of Effective C#; it was kindly sent to me by Pearson, for the purpose of reviewing.

Content and target audience

Effective C# is a style guide for C# developers – but not at the low level of "put your braces here, use PascalCase for method names;" instead, it’s at the design level. As far as I can tell, the aim isn’t to be complete, just the most important aspects of style. (Hey, otherwise there wouldn’t be any need for More Effective C#, right?) There are 50 mostly-self-contained items, totalling about 300 pages to digest – which is a nice size of book, in my opinion. It’s not daunting, and the items can definitely be bitten off one at a time.

Looking down the table of contents, the items are divided into six categories: "C# language idioms", ".NET resource management", "Expressing Designs in C#", "Working with the Framework", "Dynamic Programming in C#", and "Miscellaneous". Broadly speaking these contain the sorts of thing you’d expect – although it’s worth pointing out that a significant chunk of "Working with the Framework" is given over to Parallel Extensions, which may not be obvious from the title. (It’s a really good source of information on PFX, by the way.)

This is not a tutorial on C#. If you don’t know C# reasonably well already (including generics, lambda expressions and so on) you should read another book first, and then come back to Effective C# in order to get the most out of it.

Comment from Bill: generics and lambda expressions (and LINQ) are covered in some detail in More Effective C#. It’s a bit strange that as of the 2nd edition, Effective C# covers a newer version of the language than More Effective C#. I tried hard to make sure neither book expects a reader to have read the other, but the organization of both books as a whole does show the hazards of hitting a moving target.

That’s not to say that there’s no explanation of C# – for example, Bill goes into a few details about the "dynamic" type from C# 4, as well as overloading and how optional parameters work. But these are meant to just cover some poorly-understood (or very new) aspects of the language, rather than teaching you from the beginning. The balance here feels just right to me – I believe most professional C# developers will learn details of C# they weren’t aware of before, but won’t be confused by the basics that Bill left out.

Accuracy, opinion and explanation

My copy of Effective C# has plenty of ink annotations now. They broadly fall into five categories:

  • "Ooh, I’d never thought of that" – aspects of C# which were genuinely new to me
  • "Hell, yes!" – things I agree with 100%, and which will help developers a lot
  • "Um, I disagree" – points where Bill and I would go probably different routes, presumably due to different experiences and having worked in different contexts. (It’s possible that when put in the same context, we’d do the same thing, of course.)
  • "No, that’s technically incorrect" – a few areas which are outright wrong, or were correct for previous versions of the framework/CLR, but aren’t correct now
  • "That’s not what that term means" (or "that’s not the right term for the concept you’re trying to get across") – it should come as no surprise to regular readers that I’m a bit of a pedant when it comes to terminology

The majority of my annotations are of the third category – disagreements. That’s not because I disagree with most of the book; it’s just that the second category is reserved for vehement agreement. I haven’t bothered to note every sentence that I’m just fine with.

The good news is that in areas where we disagree, Bill does an admirable job of stating his case. I disagree with some of his arguments – or I can give counter-examples, or merely place different value on some of the pros and cons – but the important thing is that the reasoning is there. If it doesn’t apply to your context, evaluate the advice accordingly.

It’s entirely reasonable for there to be quite a bit of disagreement, as much of the book is opinion. It’s obviously founded in a great deal of experience (and I should note that Bill has spent a lot more time as a professional C# developer than I have), but it’s still opinion. I rather wish that the book was a wiki, so that these items could be debated, amended etc, as per my dream book – I think that would make it even more valuable.

There are relatively few absolutely incorrect statements, and even on the terminology front it’s usually two things which have bugged me repeatedly. Bill uses "implicit properties" for "automatically implemented properties"; I’ve usually heard developers use the abbreviated form "automatic properties" but "implicit" is new to me. Likewise the book talks about "overriding ==" instead of "overloading ==" reasonably frequently. It’s a shame I was too busy with my own book to participate in the technical review for Effective C#, as I suspect that on at least some of these points, Bill would have been happy to amend the text accordingly. I shall, of course, transcribe my comments and send them to him.

Comment from Bill: I’ll make those corrections for the subsequent printings.

What’s missing?

There are some areas which I wish Bill had touched on or emphasized more. Topics such as numbers, text and chronological values could have been given some space as they frequently confuse folks (and are full of pitfalls; see Humanity: Epic Fail for more of my thoughts on this). I would personally have placed more importance on the mantra of "value types should be immutable" – it’s certainly talked about, but in the context of "preferring" atomic, immutable value types – and preferring value types over reference types in rather more situations than I’d personally use. In terms of importance in avoiding shooting yourself in the foot, making sure all structs are immutable comes near the top of the list in my view.

"More Effective C#" doesn’t cover those areas as far as I can tell from the table of contents, but it does go into details about generics and various aspects of C# 3 and LINQ, which are clearly part of any modern C# developer’s toolkit. I certainly intend to get hold of the book to see what else I have to learn from Bill.

I think it might have been nice to have a few sections at an even higher level than the specific items in Effective C#. Topics such as:

  • Don’t trust me: I don’t know your context. Even the smartest folks can only give advice in fairly general terms in books. Don’t apply this advice blindly; weigh up the arguments presented and work out how they apply to your actual code.
  • The importance of testing. It’s possible that this was mentioned, but I don’t recall it. Perhaps it’s a little on the opinionated side (see previous point…) but for significant code bases, testing should be deeply ingrained in whatever process you’re using. Note that although it’s worth trying to keep high standards in test code, it often has a very different look and feel to production code, and different "best practices" may apply.
  • Encouraging "working with the language" – if you find yourself fighting the language, you may discover you can keep winning battles but losing the war. Typically changing your design to represent more idiomatic C# will make life more pleasant for everyone.
  • Performance: how you might decide when and how much to care.

Very few of these would be C#-specific, of course – which may be why Bill left them out. You could easily fill a whole book like that, and it would probably be horrible to read – full of platitudes rather than concrete advice. I personally think there’s room for some discussion of this kind though.

Comment from Bill: The ultimate goal was to have the book be that ‘nice size’ you mention above. I agree that all of those concepts are important. I felt that many of these features were not C# specific (or even .NET specific) that I felt better covered elsewhere.  However, that ‘working with the language’ was one area where I feel that I do cover.  There are only a small number of negative titles (e.g "avoid" something or "do not" do something). In those cases, I tried to recommend alternatives where you would find yourself "working with the language".

Conclusion

I like Effective C# a lot. I like the fact that I’ve disagreed with a number of the points raised, and in disagreeing I’ve found myself thinking about why I disagree and in what situations each point of view may be appropriate. I worry a little about inexperienced readers who may be tempted to treat this (or any other book) as an ultimate truth to be quoted out of context and used to beat other developers into inappropriate solutions… but hopefully most readers won’t be like that.

Comment from Bill: I also hope that most readers avoid that. Thank you for pointing out that I’ve tried very hard to explain when my advice applies, and when it doesn’t.  That is critical.

It’s definitely encouraged me to try to write a somewhat similar book at some point… possibly not with the same organization, and probably dealing with some very different topics – but it’s good to see that it can work. Whether I can pull it off as well as Bill remains to be seen, of course.

I’ll look forward to reading More Effective C# – although my pile of books to review is groaning somewhat, and I should probably go through another of those first :)

Non-iterable collection initializers

Yesterday on Stack Overflow, I mentioned that sometimes I make a type implement IEnumerable just so that I can use collection initializers with it. In such a situation, I use explicit interface implementation (despite not really needing to – I’m not implementing IEnumerable<T>) and leave it throwing a NotImplementedException. (EDIT: As noted in the comments, throwing NotSupportedException would probably be more appropriate. In many cases it would actually be pretty easy to implement this in some sort of meaningful fashion… although I quite like throwing an exception to indicate that it’s not really intended to be treated as a sequence.)

Why would I do such a crazy thing? Because sometimes it’s helpful to be able to construct a "collection" of items easily, even if you only want the class itself to really treat it as a collection. As an example, in a benchmarking system you might want to be able to add a load of tests individually, but you never want to ask the "test collection" what tests are in it… you just want to run the tests. The only iteration is done internally.

Now, there’s an alternative to collection initializers here: parameter arrays. You can add a "params string[]" or whatever as the final constructor parameter, and simply use the constructor. That works fine in many cases, but it falls down in others:

  • If you want to be able to add different types of values, without just using "params object[]". For example, suppose we wanted to restrict our values to int, string, DateTime and Guid… you can’t do that in a compile-time-safe way using a parameter array.
  • If you want to be able to constructor composite values from two or more parts, without having to explicitly construct that composite value each time. Think about the difference in readability between using a collection initializer for a dictionary and explicitly constructing a KeyValuePair<TKey, TValue> for each entry.
  • If you want to be able to use generics to force aspects of type safety. The Add method can be generic, so you could, for example, force two parameters for a single entry to both be of T, but different entries could have different types. This is pretty unusual, but I needed it just the other day :)

Now, it’s a bit of a hack to have to "not quite implement" IEnumerable. I’ve come up with two alternative options. These have the additional benefit of not requiring the method to always be called Add any more. I suspect it still would be in most cases, but flexibility is a bonus.

Option 1: Class level attribute

Instead of just relying on IEnumerable, the compiler could detect an attribute applied to the class, specifying the single method name for all collection initializer methods:

[CollectionInitializerMethod("AddValue")]
public class RestrictedValues
{
    public void AddValue(int x) { … }

    public void AddValue(string y) { … }
}

var values = new RestrictedValues
{
    3, "value", 10
};

Option 2: Method level attributes

In this case, each method eligible for use in a collection initializer would be decorated with the attribute:

public class RestrictedValues
{
    [CollectionInitializerMethod]
    public void AddInt32(int x) { … }

    [CollectionInitializerMethod]
    public void AddString(string y) { … }
}

var values = new RestrictedValues
{
    3, "value", 10
};

This has the disadvantage that the compiler would need to look at every method in the target class when it found a collection initializer.

Obviously both of these can be made backwardly compatible very easily: the presence of an implementation of IEnumerable with no attributes present would just fall back to using Add.

Option 3: Compiler and language simplicity

(I’ve added this in response to Peter’s comment.)

Okay, let’s stick with the Add method. All we need is another way of indicating that you should be able to use collection initializers with a type:

[AllowCollectionInitializers] 
public class RestrictedValues 

    public void Add(int x) { … } 

    public void Add(string y) { … } 
}

At this point, the changes required to the compiler (and language spec) are really minimal. In the bit of code which detects whether or not you can use a collection initializer, you just need to change from "does this type implement IEnumerable" to "does this type implement IEnumerable or have the relevant attribute defined". I can’t think of many possible language changes which would be more localized than that.

And another thing…

One final point. I’d still like the ability for collection initializers to return a value, and for that value to be used for subsequent elements of the initialization – with the final return value being the last-returned value. Any methods with a void return value would be treated as if they returned "this". This would allow you to build immutable collections easily.

Likewise you could decorate methods with a [PropertyInitializerMethod] to allow initialization of immutable types with "WithXyz" methods. Admittedly there’s less need for this now that we have optional parameters and named arguments in C# 4 – a lot of the benefits of object initializers are available just with constructors.

Anyway, just a few slightly odd thoughts around initialization for you to ponder over the weekend…

You are all individuals! (I’m not…)

I’ve been meaning to post this for a while, but recently a couple of events have coincided, reminding me about the issue.

First, Joe Duffy blogged in defence of premature optimization. Second, I started reading Bill Wagner’s Effective C#, 2nd edition, which contains advice such as "make almost all your types serializable". Now, let’s be clear: I have a great deal of respect for both of these gentlemen… but in both cases I think there’s a problem: to some extent they’re assuming a certain type of development.

In some cases, you really, really want to understand the nuts and bolts of every bit of performance. If, for example, you’re writing a parallelization library to be the part of the .NET framework. For Noda Time I’m pretty obsessed with performance, too – I really want it to be very fast indeed. And to be clear, Joe does give a certain amount of balance in the article… but I think it’s probably still biased due to his background on working on libraries where it really, really matters. For many developers, it’s vastly preferable to have the in-house HR web app used by 100 people take a little bit more time to process each request than to take an extra few days of developer work (cumulative) making sure that every little bit of it is as fast as possible. And many of the questions I’ve seen on Stack Overflow are asking for micro-optimizations which are really, really unlikely to matter. (EDIT: Just to be clear, there’s a lot of stuff I agree with in Joe’s post, but I think enough of us find correctness hard enough to start with, without having to consider every possible performance hit of every statement. At least some of the time.)

Likewise for a certain class of development, it probably does make sense to make most types serializable. If most of your objects are modelling data, serialization really will be a major factor. For other people, it won’t be. Most of my working life has been spent writing code which really doesn’t need to serialize anything… or which uses Protocol Buffers for serialization, in order to preserve portability, compactness and flexible versioning. Very few of my types should really be serializable using the platform-default binary serialization (whether in Java or .NET). Relatively few of them need to be serializable at all.

Finally, I’ll mention another example I’ve probably been guilty of: the assumption that a "public API" really can’t be changed without major hassle. An example of this is making a "public const" in C#, and later wanting to change the value of it. "No," I hear you cry… "Make it a public static readonly field instead, to avoid callers baking the value into their compiled code." Absolutely. If you’re in a situation where you may well not know all of your callers, or can’t recompile them all on every deployment, that’s great advice. But I suspect a lot of developers work in environments where they can recompile everything – where the only code which calls their code is written within the same company, and deployed all in one go.

In short, we’re not all writing system libraries. We’re not all writing data-driven business apps. We’re not all writing the same kind of code at all. Good "one size fits all" advice is pretty rare, and "we" (the community preaching best practices etc) should take that into account more often. I absolutely include myself in that chastisement, too.

Reply to a reply… tweaking refactoring

This is a reply to Ben Alabaster’s blog post, which is itself a reply. You can follow the trail yourself. I’ll assume you’ve read the post – I’m not going to go over anything already written there, other than my comments.

I took issue with three aspects of Ben’s refactoring:

  • The use of float for currency
  • The fact that "BaseRate" effectively doesn’t have a well-defined unit; in some cases it’s "dollars per hour" and in others it’s "dollars per pay cheque, regardless of hours" smells
  • The use of properties for the pay functions

I’ll tackle the last one first, because I was stupid. I suggested using public static readonly fields instead of properties. This was dumb. All we need is a simple static class with public static methods – we can use them with exactly the same source code as before for the Employee construction, but without all the fancy lambda expressions:

public static class PayCalculations
{
    public static float BasicWithoutOvertime(float hours, float baseRate)
    {
        return hours * baseRate;
    }

    public static float BasicWithOvertime(float hours, float baseRate)
    {
        if (hours < 40) return hours * baseRate;
        return ((hours – 40f) * 1.5f + 40f) * baseRate;
    }

    public static float Salary(float hours, float baseRate)
    {
        return baseRate;
    }

    public static float Contractor(float hours, float baseRate)
    {
        /* Base rate */
        float subtotal = Math.Min(hours, 40) * baseRate;
        hours -= Math.Min(hours, 40);
        /* Time plus a half */
        if (hours > 0) subtotal += 1.5f * Math.Min(hours, 20) * baseRate;
        hours -= Math.Min(hours, 20);
        /* Double time */
        if (hours > 0) subtotal += 2.0f * Math.Min(hours, 20) * baseRate;
        hours -= Math.Min(hours, 20);
        /* Double time plus a half */
        if (hours > 0) subtotal += 2.5f * hours * baseRate;

        return subtotal;
    }
}

Less code, less nesting, less use of fancy C# 3 features… generally nicer. The construction code remains the same, because it uses method group conversions to build the delegates.

Fixing the "float should be decimal" problem is easy, of course. Let’s move on to the units and "wrong" values. The problem is that the BaseRate property means different things for different employees, and in some cases it’s not even needed at all. That’s a reasonably strong indicator that it’s in the wrong place. Let’s accept that all employees’ pay may depend on the number of hours they’ve worked that week, but that’s all. Everything else depends on the particular contract that the employee is using, and that can vary. So let’s put the variance into what creates the function – so we can build a "salaried employee on $2935 week" function, a "per hour worker on $40.25 basic without overtime" etc. This is effectively like creating an IPayContract interface and multiple implementations, then creating instances of those implementations which have specific values. Except we’re using delegates… so having ripped out the lambda expressions, I’m going to put them back in :) But this time we’re just going to use a Func<decimal, decimal> as we only to know how much to pay given a certain number of hours worked. (The first decimal here could potentially be float or double instead, but if anyone ever did claim to work 3.1 hours, they’d probably want pay that reflected it.)

Here are the pay calculations:

public static class PayCalculations
{
    public static Func<decimal, decimal> BasicWithoutOvertime(decimal dollarsPerHour)
    {
        return hours => dollarsPerHour * hours;
    }

    public static Func<decimal, decimal> BasicWithOvertime(decimal dollarsPerHour)
    {
        // Use an alternative approach just for LOLs
        return hours => {
            decimal basicHours = Math.Min(hours, 40);
            decimal overtimeHours = Math.Max(hours – 40, 0);
            return (basicHours * dollarsPerHour) + (overtimeHours * dollarsPerHour * 1.5m);
        };
    }

    public static Func<decimal, decimal> Salary(decimal dollarsPerWeek)
    {
        // This *looks* like the units are wrong… but it’s okay, see text.
        return hours => dollarsPerWeek;
    }

    public static Func<decimal, decimal> Contractor(decimal baseRate)
    {
        return hours => {
            // 0-40 hours
            decimal basicHours = Math.Min(hours, 40);
            // 40-60 hours
            decimal overtime = Math.Min(Math.Max(hours – 40, 0), 20);
            // 60-80 hours
            decimal doubleTime = Math.Min(Math.Max(hours – 60, 0), 20);
            // 80+ hours
            decimal chargingThroughTheNoseTime = Math.Max(hours – 80, 0);

            return (basicHours * baseRate)
                 + (overtime * baseRate * 1.5m)
                 + (doubleTime * baseRate * 2m)
                 + (chargingThroughTheNoseTime * baseRate * 2.5m);
        };
    }
}

And now, when we construct the employees, we don’t have to specify a base rate which was only meaningful in some cases – instead, we give that value to the pay calculator instead:

List<Employee> employees = new List<Employee>
{
    new Employee("John", "MacIntyre", PayCalculations.BasicWithoutOvertime(40.25m)),
    new Employee("Ben", "Alabaster", PayCalculations.BasicWithOvertime(40.25m)),
    new Employee("Cory", "Fowler", PayCalculations.Salary(2935m)),
    new Employee("John", "Doe", PayCalculations.Contractor(150m)),
    new Employee("Jane", "Doe", hours => 3500m),
    new Employee("Joe", "Bloggs", hours => 34.25m * Math.Max(hours, 15))
};

Now, look at the Salary method and the comment in it… I’m still concerned about the units. We’re meant to be returning a simple dollar value (and in another system I’d probably bake that information into the types used) but we’ve still got dollarsPerWeek. What’s wrong here? Well, it all boils down to an assumption: we’re running this once a week. We’ve got a constant of 1 week… so we could rewrite the method like this:

public static Func<decimal, decimal> Salary(decimal dollarsPerWeek)
{
    decimal weeksWorked = 1m; // We run payroll once per week
    return hours => dollarsPerWeek * weeksWorked;
}

Now the units work… although it looks a bit silly. Of course, it makes our assumption very explicit – and easy to challenge. Maybe we actually run payroll once per month… in which case the fact that we’re expressing the salary in dollars per week is wrong – but very obviously wrong, which is a good thing.

Conclusion

It doesn’t feel right to have a blog post with no conclusion. Lessons applied here:

  • Remember all the tools available, not just the shiny ones. Using method group conversions made the initial "constant function" approach simpler to understand.
  • Units are important! If the same field effectively represents different units for different instances, there’s something wrong
  • If a field is only relevant for some instances of a type, it’s probably in the wrong place
  • Don’t use floats for currency. See any number of Stack Overflow questions for reasons why :)

EDIT: As noted by Barry Dorrans, there’s a lot of scope for introducing constants in here, for further goodness. I’m not going to bother updating the code just for Barry though. That way madness lies.

Query expression syntax: continuations

In this Stack Overflow question, I used a query continuation from a select clause, and one commenter expressed surprise, being unaware of what "select … into" meant. He asked for any references beyond the MSDN "into" page, and I didn’t know of any specific ones. So, here’s a very quick guide.

When "into" is used after either a "group x by y" or "select x" clause, it’s called a query continuation. (Note that "join … into" clauses are not query continuations; they’re very different.) A query continuation effectively says, "I’ve finished one query, and I want to do another one with the results… but all in one expression." This query:

var query = from x in y
            // other query clauses here
            select x.SomeProperty into z
            // other query clauses here (involving z)
            select z.Result;

Has *exactly* the same behaviour as this (leaving aside the visible local variable):

var tmp = from x in y
          // other query clauses here
          select x.SomeProperty;

var query = from z in tmp
            // other query clauses here (involving z)
            select z.Result;

Indeed the specification is written in terms of a transformation a bit like that. Note that the query continuation starts a clean slate in terms of range variables – after the "into z" part, x is not in scope.

Personally I usually split a query up into two statements with a local variable (i.e. the second form) rather than using a "select … into" query continuation, but it’s useful to know about them. I find I use "group … into" much more often, and I suspect others do too – it’s relatively rare to see a "group by" clause ending a LINQ query on its own.

Mini-post: abstractions vs repetition; a driving analogy

Driving Tom back from a children’s party this afternoon, I was thinking about Noda Time.

I’ve been planning to rework the parsing/formatting API, so that each chronological type (ZonedDateTime, LocalDateTime, LocalDate, LocalTime) has its own formatter and parser pair. I suspect this will involve quite a bit of similar code between the various classes… but code which is easy to understand and easy to test in its simple form.

The question which is hard to answer before the first implementation is whether it will be worth trying to abstract out that similar code to avoid repetition. In my experience, quite often something that sounds like a good idea in terms of abstraction ends up becoming significantly more complex than the "just implement each class separately" approach… but we’ll see.

Around this time, I ended up stuck behind a car which was going at 20mph between speed bumps, decreasing to 10-15mph near the speed bumps, of which there were quite a few. I could have tried overtaking, but visibility wasn’t great, and I wasn’t far from home anyway. I regard overtaking as a somewhat extreme measure when I’m not on a dual carriageway. It was frustrating, but I knew I wouldn’t be losing very much. The risks involved in overtaking (combined with the possibility that I’d end up stuck behind someone else) weren’t worth the small benefit of getting past the car in front.

It struck me that the two situations were somewhat similar. I know from experience that trying to extract a general purpose abstraction to avoid repetition can be risky: corner cases where the abstraction just doesn’t work can hide themselves until quite late in the proceedings, or you may find that you end up with nearly as much repetition anyway due to something else. The repetitive code is a burden, but one which is more easily reckoned with.

I suspect I won’t try too hard to abstract out lots of the code for formatting and parsing in Noda Time. I’m sure there’ll be quite a lot which can be easily factored out, but anything which is non-obvious to start with can wait for another time. For the moment, I’ll drive slowly but steadily, without any flashy moves.

Very quick interlude: I know that CAPTCHAs aren’t working on this blog if you’re using Chrome

This is just a quick post to hopefully stem the tide of people commenting (in blog comments and Twitter messages) saying that this blog is broken when it comes to CAPTCHAs and Chrome. I know, but there’s nothing I can do about it – I don’t run the software on the blog. I haven’t looked into why it’s not working… if you open the image in a new tab, it displays that fine. Weird, but such is life… I’m hoping that an update to either Community Server or Chrome will fix it eventually.