Book Review: Effective C# (2nd edition) by Bill Wagner

Resources:

Disclaimer

Just in case you’re unaware, I’m the author of another C# book, C# in Depth. Although Effective C# is somewhat different to my book, they certainly share a target audience. To that extent, Bill and I are competitors. I try hard to stay unbiased in reviews, but it’s probably impossible. Bear this in mind while reading. I should also note that I didn’t buy my copy of Effective C#; it was kindly sent to me by Pearson, for the purpose of reviewing.

Content and target audience

Effective C# is a style guide for C# developers – but not at the low level of "put your braces here, use PascalCase for method names;" instead, it’s at the design level. As far as I can tell, the aim isn’t to be complete, just the most important aspects of style. (Hey, otherwise there wouldn’t be any need for More Effective C#, right?) There are 50 mostly-self-contained items, totalling about 300 pages to digest – which is a nice size of book, in my opinion. It’s not daunting, and the items can definitely be bitten off one at a time.

Looking down the table of contents, the items are divided into six categories: "C# language idioms", ".NET resource management", "Expressing Designs in C#", "Working with the Framework", "Dynamic Programming in C#", and "Miscellaneous". Broadly speaking these contain the sorts of thing you’d expect – although it’s worth pointing out that a significant chunk of "Working with the Framework" is given over to Parallel Extensions, which may not be obvious from the title. (It’s a really good source of information on PFX, by the way.)

This is not a tutorial on C#. If you don’t know C# reasonably well already (including generics, lambda expressions and so on) you should read another book first, and then come back to Effective C# in order to get the most out of it.

Comment from Bill: generics and lambda expressions (and LINQ) are covered in some detail in More Effective C#. It’s a bit strange that as of the 2nd edition, Effective C# covers a newer version of the language than More Effective C#. I tried hard to make sure neither book expects a reader to have read the other, but the organization of both books as a whole does show the hazards of hitting a moving target.

That’s not to say that there’s no explanation of C# – for example, Bill goes into a few details about the "dynamic" type from C# 4, as well as overloading and how optional parameters work. But these are meant to just cover some poorly-understood (or very new) aspects of the language, rather than teaching you from the beginning. The balance here feels just right to me – I believe most professional C# developers will learn details of C# they weren’t aware of before, but won’t be confused by the basics that Bill left out.

Accuracy, opinion and explanation

My copy of Effective C# has plenty of ink annotations now. They broadly fall into five categories:

  • "Ooh, I’d never thought of that" – aspects of C# which were genuinely new to me
  • "Hell, yes!" – things I agree with 100%, and which will help developers a lot
  • "Um, I disagree" – points where Bill and I would go probably different routes, presumably due to different experiences and having worked in different contexts. (It’s possible that when put in the same context, we’d do the same thing, of course.)
  • "No, that’s technically incorrect" – a few areas which are outright wrong, or were correct for previous versions of the framework/CLR, but aren’t correct now
  • "That’s not what that term means" (or "that’s not the right term for the concept you’re trying to get across") – it should come as no surprise to regular readers that I’m a bit of a pedant when it comes to terminology

The majority of my annotations are of the third category – disagreements. That’s not because I disagree with most of the book; it’s just that the second category is reserved for vehement agreement. I haven’t bothered to note every sentence that I’m just fine with.

The good news is that in areas where we disagree, Bill does an admirable job of stating his case. I disagree with some of his arguments – or I can give counter-examples, or merely place different value on some of the pros and cons – but the important thing is that the reasoning is there. If it doesn’t apply to your context, evaluate the advice accordingly.

It’s entirely reasonable for there to be quite a bit of disagreement, as much of the book is opinion. It’s obviously founded in a great deal of experience (and I should note that Bill has spent a lot more time as a professional C# developer than I have), but it’s still opinion. I rather wish that the book was a wiki, so that these items could be debated, amended etc, as per my dream book – I think that would make it even more valuable.

There are relatively few absolutely incorrect statements, and even on the terminology front it’s usually two things which have bugged me repeatedly. Bill uses "implicit properties" for "automatically implemented properties"; I’ve usually heard developers use the abbreviated form "automatic properties" but "implicit" is new to me. Likewise the book talks about "overriding ==" instead of "overloading ==" reasonably frequently. It’s a shame I was too busy with my own book to participate in the technical review for Effective C#, as I suspect that on at least some of these points, Bill would have been happy to amend the text accordingly. I shall, of course, transcribe my comments and send them to him.

Comment from Bill: I’ll make those corrections for the subsequent printings.

What’s missing?

There are some areas which I wish Bill had touched on or emphasized more. Topics such as numbers, text and chronological values could have been given some space as they frequently confuse folks (and are full of pitfalls; see Humanity: Epic Fail for more of my thoughts on this). I would personally have placed more importance on the mantra of "value types should be immutable" – it’s certainly talked about, but in the context of "preferring" atomic, immutable value types – and preferring value types over reference types in rather more situations than I’d personally use. In terms of importance in avoiding shooting yourself in the foot, making sure all structs are immutable comes near the top of the list in my view.

"More Effective C#" doesn’t cover those areas as far as I can tell from the table of contents, but it does go into details about generics and various aspects of C# 3 and LINQ, which are clearly part of any modern C# developer’s toolkit. I certainly intend to get hold of the book to see what else I have to learn from Bill.

I think it might have been nice to have a few sections at an even higher level than the specific items in Effective C#. Topics such as:

  • Don’t trust me: I don’t know your context. Even the smartest folks can only give advice in fairly general terms in books. Don’t apply this advice blindly; weigh up the arguments presented and work out how they apply to your actual code.
  • The importance of testing. It’s possible that this was mentioned, but I don’t recall it. Perhaps it’s a little on the opinionated side (see previous point…) but for significant code bases, testing should be deeply ingrained in whatever process you’re using. Note that although it’s worth trying to keep high standards in test code, it often has a very different look and feel to production code, and different "best practices" may apply.
  • Encouraging "working with the language" – if you find yourself fighting the language, you may discover you can keep winning battles but losing the war. Typically changing your design to represent more idiomatic C# will make life more pleasant for everyone.
  • Performance: how you might decide when and how much to care.

Very few of these would be C#-specific, of course – which may be why Bill left them out. You could easily fill a whole book like that, and it would probably be horrible to read – full of platitudes rather than concrete advice. I personally think there’s room for some discussion of this kind though.

Comment from Bill: The ultimate goal was to have the book be that ‘nice size’ you mention above. I agree that all of those concepts are important. I felt that many of these features were not C# specific (or even .NET specific) that I felt better covered elsewhere.  However, that ‘working with the language’ was one area where I feel that I do cover.  There are only a small number of negative titles (e.g "avoid" something or "do not" do something). In those cases, I tried to recommend alternatives where you would find yourself "working with the language".

Conclusion

I like Effective C# a lot. I like the fact that I’ve disagreed with a number of the points raised, and in disagreeing I’ve found myself thinking about why I disagree and in what situations each point of view may be appropriate. I worry a little about inexperienced readers who may be tempted to treat this (or any other book) as an ultimate truth to be quoted out of context and used to beat other developers into inappropriate solutions… but hopefully most readers won’t be like that.

Comment from Bill: I also hope that most readers avoid that. Thank you for pointing out that I’ve tried very hard to explain when my advice applies, and when it doesn’t.  That is critical.

It’s definitely encouraged me to try to write a somewhat similar book at some point… possibly not with the same organization, and probably dealing with some very different topics – but it’s good to see that it can work. Whether I can pull it off as well as Bill remains to be seen, of course.

I’ll look forward to reading More Effective C# – although my pile of books to review is groaning somewhat, and I should probably go through another of those first :)

Non-iterable collection initializers

Yesterday on Stack Overflow, I mentioned that sometimes I make a type implement IEnumerable just so that I can use collection initializers with it. In such a situation, I use explicit interface implementation (despite not really needing to – I’m not implementing IEnumerable<T>) and leave it throwing a NotImplementedException. (EDIT: As noted in the comments, throwing NotSupportedException would probably be more appropriate. In many cases it would actually be pretty easy to implement this in some sort of meaningful fashion… although I quite like throwing an exception to indicate that it’s not really intended to be treated as a sequence.)

Why would I do such a crazy thing? Because sometimes it’s helpful to be able to construct a "collection" of items easily, even if you only want the class itself to really treat it as a collection. As an example, in a benchmarking system you might want to be able to add a load of tests individually, but you never want to ask the "test collection" what tests are in it… you just want to run the tests. The only iteration is done internally.

Now, there’s an alternative to collection initializers here: parameter arrays. You can add a "params string[]" or whatever as the final constructor parameter, and simply use the constructor. That works fine in many cases, but it falls down in others:

  • If you want to be able to add different types of values, without just using "params object[]". For example, suppose we wanted to restrict our values to int, string, DateTime and Guid… you can’t do that in a compile-time-safe way using a parameter array.
  • If you want to be able to constructor composite values from two or more parts, without having to explicitly construct that composite value each time. Think about the difference in readability between using a collection initializer for a dictionary and explicitly constructing a KeyValuePair<TKey, TValue> for each entry.
  • If you want to be able to use generics to force aspects of type safety. The Add method can be generic, so you could, for example, force two parameters for a single entry to both be of T, but different entries could have different types. This is pretty unusual, but I needed it just the other day :)

Now, it’s a bit of a hack to have to "not quite implement" IEnumerable. I’ve come up with two alternative options. These have the additional benefit of not requiring the method to always be called Add any more. I suspect it still would be in most cases, but flexibility is a bonus.

Option 1: Class level attribute

Instead of just relying on IEnumerable, the compiler could detect an attribute applied to the class, specifying the single method name for all collection initializer methods:

[CollectionInitializerMethod("AddValue")]
public class RestrictedValues
{
    public void AddValue(int x) { … }

    public void AddValue(string y) { … }
}

var values = new RestrictedValues
{
    3, "value", 10
};

Option 2: Method level attributes

In this case, each method eligible for use in a collection initializer would be decorated with the attribute:

public class RestrictedValues
{
    [CollectionInitializerMethod]
    public void AddInt32(int x) { … }

    [CollectionInitializerMethod]
    public void AddString(string y) { … }
}

var values = new RestrictedValues
{
    3, "value", 10
};

This has the disadvantage that the compiler would need to look at every method in the target class when it found a collection initializer.

Obviously both of these can be made backwardly compatible very easily: the presence of an implementation of IEnumerable with no attributes present would just fall back to using Add.

Option 3: Compiler and language simplicity

(I’ve added this in response to Peter’s comment.)

Okay, let’s stick with the Add method. All we need is another way of indicating that you should be able to use collection initializers with a type:

[AllowCollectionInitializers] 
public class RestrictedValues 

    public void Add(int x) { … } 

    public void Add(string y) { … } 
}

At this point, the changes required to the compiler (and language spec) are really minimal. In the bit of code which detects whether or not you can use a collection initializer, you just need to change from "does this type implement IEnumerable" to "does this type implement IEnumerable or have the relevant attribute defined". I can’t think of many possible language changes which would be more localized than that.

And another thing…

One final point. I’d still like the ability for collection initializers to return a value, and for that value to be used for subsequent elements of the initialization – with the final return value being the last-returned value. Any methods with a void return value would be treated as if they returned "this". This would allow you to build immutable collections easily.

Likewise you could decorate methods with a [PropertyInitializerMethod] to allow initialization of immutable types with "WithXyz" methods. Admittedly there’s less need for this now that we have optional parameters and named arguments in C# 4 – a lot of the benefits of object initializers are available just with constructors.

Anyway, just a few slightly odd thoughts around initialization for you to ponder over the weekend…

You are all individuals! (I’m not…)

I’ve been meaning to post this for a while, but recently a couple of events have coincided, reminding me about the issue.

First, Joe Duffy blogged in defence of premature optimization. Second, I started reading Bill Wagner’s Effective C#, 2nd edition, which contains advice such as "make almost all your types serializable". Now, let’s be clear: I have a great deal of respect for both of these gentlemen… but in both cases I think there’s a problem: to some extent they’re assuming a certain type of development.

In some cases, you really, really want to understand the nuts and bolts of every bit of performance. If, for example, you’re writing a parallelization library to be the part of the .NET framework. For Noda Time I’m pretty obsessed with performance, too – I really want it to be very fast indeed. And to be clear, Joe does give a certain amount of balance in the article… but I think it’s probably still biased due to his background on working on libraries where it really, really matters. For many developers, it’s vastly preferable to have the in-house HR web app used by 100 people take a little bit more time to process each request than to take an extra few days of developer work (cumulative) making sure that every little bit of it is as fast as possible. And many of the questions I’ve seen on Stack Overflow are asking for micro-optimizations which are really, really unlikely to matter. (EDIT: Just to be clear, there’s a lot of stuff I agree with in Joe’s post, but I think enough of us find correctness hard enough to start with, without having to consider every possible performance hit of every statement. At least some of the time.)

Likewise for a certain class of development, it probably does make sense to make most types serializable. If most of your objects are modelling data, serialization really will be a major factor. For other people, it won’t be. Most of my working life has been spent writing code which really doesn’t need to serialize anything… or which uses Protocol Buffers for serialization, in order to preserve portability, compactness and flexible versioning. Very few of my types should really be serializable using the platform-default binary serialization (whether in Java or .NET). Relatively few of them need to be serializable at all.

Finally, I’ll mention another example I’ve probably been guilty of: the assumption that a "public API" really can’t be changed without major hassle. An example of this is making a "public const" in C#, and later wanting to change the value of it. "No," I hear you cry… "Make it a public static readonly field instead, to avoid callers baking the value into their compiled code." Absolutely. If you’re in a situation where you may well not know all of your callers, or can’t recompile them all on every deployment, that’s great advice. But I suspect a lot of developers work in environments where they can recompile everything – where the only code which calls their code is written within the same company, and deployed all in one go.

In short, we’re not all writing system libraries. We’re not all writing data-driven business apps. We’re not all writing the same kind of code at all. Good "one size fits all" advice is pretty rare, and "we" (the community preaching best practices etc) should take that into account more often. I absolutely include myself in that chastisement, too.

Reply to a reply… tweaking refactoring

This is a reply to Ben Alabaster’s blog post, which is itself a reply. You can follow the trail yourself. I’ll assume you’ve read the post – I’m not going to go over anything already written there, other than my comments.

I took issue with three aspects of Ben’s refactoring:

  • The use of float for currency
  • The fact that "BaseRate" effectively doesn’t have a well-defined unit; in some cases it’s "dollars per hour" and in others it’s "dollars per pay cheque, regardless of hours" smells
  • The use of properties for the pay functions

I’ll tackle the last one first, because I was stupid. I suggested using public static readonly fields instead of properties. This was dumb. All we need is a simple static class with public static methods – we can use them with exactly the same source code as before for the Employee construction, but without all the fancy lambda expressions:

public static class PayCalculations
{
    public static float BasicWithoutOvertime(float hours, float baseRate)
    {
        return hours * baseRate;
    }

    public static float BasicWithOvertime(float hours, float baseRate)
    {
        if (hours < 40) return hours * baseRate;
        return ((hours – 40f) * 1.5f + 40f) * baseRate;
    }

    public static float Salary(float hours, float baseRate)
    {
        return baseRate;
    }

    public static float Contractor(float hours, float baseRate)
    {
        /* Base rate */
        float subtotal = Math.Min(hours, 40) * baseRate;
        hours -= Math.Min(hours, 40);
        /* Time plus a half */
        if (hours > 0) subtotal += 1.5f * Math.Min(hours, 20) * baseRate;
        hours -= Math.Min(hours, 20);
        /* Double time */
        if (hours > 0) subtotal += 2.0f * Math.Min(hours, 20) * baseRate;
        hours -= Math.Min(hours, 20);
        /* Double time plus a half */
        if (hours > 0) subtotal += 2.5f * hours * baseRate;

        return subtotal;
    }
}

Less code, less nesting, less use of fancy C# 3 features… generally nicer. The construction code remains the same, because it uses method group conversions to build the delegates.

Fixing the "float should be decimal" problem is easy, of course. Let’s move on to the units and "wrong" values. The problem is that the BaseRate property means different things for different employees, and in some cases it’s not even needed at all. That’s a reasonably strong indicator that it’s in the wrong place. Let’s accept that all employees’ pay may depend on the number of hours they’ve worked that week, but that’s all. Everything else depends on the particular contract that the employee is using, and that can vary. So let’s put the variance into what creates the function – so we can build a "salaried employee on $2935 week" function, a "per hour worker on $40.25 basic without overtime" etc. This is effectively like creating an IPayContract interface and multiple implementations, then creating instances of those implementations which have specific values. Except we’re using delegates… so having ripped out the lambda expressions, I’m going to put them back in :) But this time we’re just going to use a Func<decimal, decimal> as we only to know how much to pay given a certain number of hours worked. (The first decimal here could potentially be float or double instead, but if anyone ever did claim to work 3.1 hours, they’d probably want pay that reflected it.)

Here are the pay calculations:

public static class PayCalculations
{
    public static Func<decimal, decimal> BasicWithoutOvertime(decimal dollarsPerHour)
    {
        return hours => dollarsPerHour * hours;
    }

    public static Func<decimal, decimal> BasicWithOvertime(decimal dollarsPerHour)
    {
        // Use an alternative approach just for LOLs
        return hours => {
            decimal basicHours = Math.Min(hours, 40);
            decimal overtimeHours = Math.Max(hours – 40, 0);
            return (basicHours * dollarsPerHour) + (overtimeHours * dollarsPerHour * 1.5m);
        };
    }

    public static Func<decimal, decimal> Salary(decimal dollarsPerWeek)
    {
        // This *looks* like the units are wrong… but it’s okay, see text.
        return hours => dollarsPerWeek;
    }

    public static Func<decimal, decimal> Contractor(decimal baseRate)
    {
        return hours => {
            // 0-40 hours
            decimal basicHours = Math.Min(hours, 40);
            // 40-60 hours
            decimal overtime = Math.Min(Math.Max(hours – 40, 0), 20);
            // 60-80 hours
            decimal doubleTime = Math.Min(Math.Max(hours – 60, 0), 20);
            // 80+ hours
            decimal chargingThroughTheNoseTime = Math.Max(hours – 80, 0);

            return (basicHours * baseRate)
                 + (overtime * baseRate * 1.5m)
                 + (doubleTime * baseRate * 2m)
                 + (chargingThroughTheNoseTime * baseRate * 2.5m);
        };
    }
}

And now, when we construct the employees, we don’t have to specify a base rate which was only meaningful in some cases – instead, we give that value to the pay calculator instead:

List<Employee> employees = new List<Employee>
{
    new Employee("John", "MacIntyre", PayCalculations.BasicWithoutOvertime(40.25m)),
    new Employee("Ben", "Alabaster", PayCalculations.BasicWithOvertime(40.25m)),
    new Employee("Cory", "Fowler", PayCalculations.Salary(2935m)),
    new Employee("John", "Doe", PayCalculations.Contractor(150m)),
    new Employee("Jane", "Doe", hours => 3500m),
    new Employee("Joe", "Bloggs", hours => 34.25m * Math.Max(hours, 15))
};

Now, look at the Salary method and the comment in it… I’m still concerned about the units. We’re meant to be returning a simple dollar value (and in another system I’d probably bake that information into the types used) but we’ve still got dollarsPerWeek. What’s wrong here? Well, it all boils down to an assumption: we’re running this once a week. We’ve got a constant of 1 week… so we could rewrite the method like this:

public static Func<decimal, decimal> Salary(decimal dollarsPerWeek)
{
    decimal weeksWorked = 1m; // We run payroll once per week
    return hours => dollarsPerWeek * weeksWorked;
}

Now the units work… although it looks a bit silly. Of course, it makes our assumption very explicit – and easy to challenge. Maybe we actually run payroll once per month… in which case the fact that we’re expressing the salary in dollars per week is wrong – but very obviously wrong, which is a good thing.

Conclusion

It doesn’t feel right to have a blog post with no conclusion. Lessons applied here:

  • Remember all the tools available, not just the shiny ones. Using method group conversions made the initial "constant function" approach simpler to understand.
  • Units are important! If the same field effectively represents different units for different instances, there’s something wrong
  • If a field is only relevant for some instances of a type, it’s probably in the wrong place
  • Don’t use floats for currency. See any number of Stack Overflow questions for reasons why :)

EDIT: As noted by Barry Dorrans, there’s a lot of scope for introducing constants in here, for further goodness. I’m not going to bother updating the code just for Barry though. That way madness lies.

Query expression syntax: continuations

In this Stack Overflow question, I used a query continuation from a select clause, and one commenter expressed surprise, being unaware of what "select … into" meant. He asked for any references beyond the MSDN "into" page, and I didn’t know of any specific ones. So, here’s a very quick guide.

When "into" is used after either a "group x by y" or "select x" clause, it’s called a query continuation. (Note that "join … into" clauses are not query continuations; they’re very different.) A query continuation effectively says, "I’ve finished one query, and I want to do another one with the results… but all in one expression." This query:

var query = from x in y
            // other query clauses here
            select x.SomeProperty into z
            // other query clauses here (involving z)
            select z.Result;

Has *exactly* the same behaviour as this (leaving aside the visible local variable):

var tmp = from x in y
          // other query clauses here
          select x.SomeProperty;

var query = from z in tmp
            // other query clauses here (involving z)
            select z.Result;

Indeed the specification is written in terms of a transformation a bit like that. Note that the query continuation starts a clean slate in terms of range variables – after the "into z" part, x is not in scope.

Personally I usually split a query up into two statements with a local variable (i.e. the second form) rather than using a "select … into" query continuation, but it’s useful to know about them. I find I use "group … into" much more often, and I suspect others do too – it’s relatively rare to see a "group by" clause ending a LINQ query on its own.

Mini-post: abstractions vs repetition; a driving analogy

Driving Tom back from a children’s party this afternoon, I was thinking about Noda Time.

I’ve been planning to rework the parsing/formatting API, so that each chronological type (ZonedDateTime, LocalDateTime, LocalDate, LocalTime) has its own formatter and parser pair. I suspect this will involve quite a bit of similar code between the various classes… but code which is easy to understand and easy to test in its simple form.

The question which is hard to answer before the first implementation is whether it will be worth trying to abstract out that similar code to avoid repetition. In my experience, quite often something that sounds like a good idea in terms of abstraction ends up becoming significantly more complex than the "just implement each class separately" approach… but we’ll see.

Around this time, I ended up stuck behind a car which was going at 20mph between speed bumps, decreasing to 10-15mph near the speed bumps, of which there were quite a few. I could have tried overtaking, but visibility wasn’t great, and I wasn’t far from home anyway. I regard overtaking as a somewhat extreme measure when I’m not on a dual carriageway. It was frustrating, but I knew I wouldn’t be losing very much. The risks involved in overtaking (combined with the possibility that I’d end up stuck behind someone else) weren’t worth the small benefit of getting past the car in front.

It struck me that the two situations were somewhat similar. I know from experience that trying to extract a general purpose abstraction to avoid repetition can be risky: corner cases where the abstraction just doesn’t work can hide themselves until quite late in the proceedings, or you may find that you end up with nearly as much repetition anyway due to something else. The repetitive code is a burden, but one which is more easily reckoned with.

I suspect I won’t try too hard to abstract out lots of the code for formatting and parsing in Noda Time. I’m sure there’ll be quite a lot which can be easily factored out, but anything which is non-obvious to start with can wait for another time. For the moment, I’ll drive slowly but steadily, without any flashy moves.

Very quick interlude: I know that CAPTCHAs aren’t working on this blog if you’re using Chrome

This is just a quick post to hopefully stem the tide of people commenting (in blog comments and Twitter messages) saying that this blog is broken when it comes to CAPTCHAs and Chrome. I know, but there’s nothing I can do about it – I don’t run the software on the blog. I haven’t looked into why it’s not working… if you open the image in a new tab, it displays that fine. Weird, but such is life… I’m hoping that an update to either Community Server or Chrome will fix it eventually.