Book review: C# 3.0 in a Nutshell

March 31, 2008 jonskeet 4 Comments

Resources:

Book’s web site (includes various tools such as LINQPad)
Amazon
Errata

Introduction

The original C# in a Nutshell was the book I cut my C# teeth on, so to speak. Basically I read it (well, the bits which weren’t just reproductions of MSDN – gone in this edition, thankfully), played around in Visual Studio, and then started to answer questions on the C# newsgroup. (That’s a great way of learning useful things, by the way – find another person’s problem which sounds like it’s one you might face in the future, then research the answer.)

Five and a half years later (Google groups suggests I cut my teeth in the C# newsgroup in August 2002) I’ve just been reading C# 3.0 in a Nutshell, by Joe and Ben Albahari (who are brothers, in case anyone’s wondering). Unsurprisingly, there’s rather a lot more in this version :) I bought the book with a fairly open mind, and as you’ll see, I was quite impressed…

For the purposes of this review, I’ll use the “Nutshell” to mean “C# 3.0 in a Nutshell”. It’ll just make life a bit easier.

Scope

Nutshell covers:

C# 1.0 to 3.0 (i.e. it’s “from scratch”)
Core framework (serialization, assemblies, IO, strings, regex, reflection, threading etc)
LINQ to XML
A bit of LINQ to SQL while discussing LINQ in general

It explicitly doesn’t try to be a “one stop shop for every .NET technology”. You won’t find much about WinForms, WPF, WCF, ASP.NET etc – to which my reaction is “hoorah!”. I’ve probably said it before on this blog, but if you’re going to use any of those technologies in any depth, you really need to study that topic in isolation. One chapter in a bigger book just isn’t going to cut it.

That’s the scope of the book. The scope of my review is slightly more limited. I’ve read the C# stuff reasonably closely, and dipped into some of the framework aspects – particularly those I’m fairly knowledgeable about. The idea was to judge the accuracy and depth of coverage, which would be hard to do for topics which I’m relatively inexperienced in.

Format and style

Nutshell is a reference book which goes to some lengths to be readable in a “cover to cover” style. It’s worth mentioning the contrast here with C# in Depth, which is a “cover to cover” book which attempts to be useful as a reference too. I’d expect the index of Nutshell to be used much more than the index of C# in Depth, for example – but both books can be used in either way.

As an example of the difference in style, each section of Nutshell stands on its own: there’s little in the way of segues from one section to the next. That’s not to say that there are no segues, or indeed that there’s no flow: the order in which topics are introduced is pretty logical and sometimes there’s an explicit “having looked at X we’ll now look at Y” – but it feels to me like there’s less attempt to keep the reader going. That’s absolutely right for a reference book, and it doesn’t prevent the book from being read from cover to cover – it just doesn’t particularly encourage it.

There are lots of little callout notes, both in terms of “look here for more depth” and “be careful – here be dragons”. These are very welcome, and call attention to a lot of important points.

The layout is perfectly pleasant, in a “normal book” kind of way – it’s neither the alternating text/code/text/code style of the King/Eckel book, nor the “pictures everywhere” Head First format. In that sense it’s reasonably close to C# in Depth, although it uses comments instead of arrows for annotations. The physical format is slightly shorter and narrower than most technical books. This gives a different feeling which is hard to characterize somehow, but definitely present.

Accuracy and Depth

The main problem I had with Head First C# was the inaccuracies (which, I have to stress, are hopefully going to be fixed to a large extent in a future printing). While there are inaccuracies in Nutshell, they are generally fewer and further between, and less important. In short, I’m certainly not worried about developers learning bad habits and incorrect truths from Nutshell. Again, I’ve sent my list of potential corrections to the authors, who have been very receptive. (It’s also worth being skeptical about some of the errata which have been submitted – I’ve taken issue with several of them.)

The level of depth is very well chosen, given the scope of the book. As examples, the threading section goes into the memory model and the string section talks a bit about surrogates. It would be nice to see a little bit more discussion about internationalisation (with reference to the Turkey test, for example) as well as more details of the differences between decimal and float/double – but these are all a matter of personal preference. By way of recommendation, I’d say that if every professional developer working in .NET knew and applied the contents of Nutshell, we’d be in a far better state as a development community and industry.

The coverage of C# is very good in terms of what it does – again, appropriate for a reference work. I’d like to think that C# in Depth goes into more detail of how and why the language is designed that way, because that’s a large part of the book’s raison d’être. It would be a pretty sad indictment of C# in Depth if Nutshell were a complete superset of its material, after all.

Competitive Analysis

So, why would you buy one book and not the other? Or should you buy both? Well…

Nutshell covers C# 1 as well as 2 and 3. The 3.0 features are clearly labelled, and there’s an overview page of what’s new for C# 3.0 – but if you know C# 1 and just want to learn what’s in 2 and 3, C# in Depth will take you on that path more smoothly. On the other hand, if you want to look up aspects of C# 1 for reference, Nutshell is exactly what you need. I wouldn’t really recommend either of them to learn C# from scratch – if you know Java to start with, then Nutshell might be okay, but frankly getting a “basics” tutorial style book is a better starting point.
Nutshell covers the .NET framework as well as the language. C# in Depth looks at LINQ to Objects, rushes through LINQ to SQL/XML/DataSet, and has a bit of a look at generic collections – it’s not in the same league on this front, basically.
Nutshell aims to be a reference book, C# in Depth aims to be a teaching book. Both work to a pretty reasonable extent at doing the reverse.

To restate this in terms of people:

If you’re an existing C# 1 developer (or C# 2 wanting more depth) who wants to learn C# 2 and 3 in great detail without wading through a lot of stuff you already know, get C# in Depth.
If you want a C# and .NET reference book, get Nutshell.
If you want to learn C# from scratch, buy a “tutorial” book about C# before getting either Nutshell or C# in Depth.

Clearly Nutshell and C# in Depth are in competition: there will be plenty of people who only want to buy one of them, and which one will be more appropriate for them will depend on the individual’s needs. However, I believe there are actually more developers who would benefit greatly from having both books. I’m certainly pleased to have Nutshell on my desk (and indeed it answered a colleague’s question just this morning) – and I hope the Albahari brothers will likewise gain something from reading C# in Depth.

Conclusion

C# 3.0 in a Nutshell is really good, and will benefit many developers. It doesn’t make me feel I’ve in any way wasted my time in writing C# in Depth, and the two make good companion books, even though the material is clearly overlapping. Obviously I’d like all my readers to buy C# in Depth in preference if you can only buy one – but it really does make sense to have both.

Range variables – is it just me that thinks they’re different?

March 27, 2008 jonskeet 7 Comments

Update: Ian Griffiths has a more detailed blog post about this matter.

Range variables are the variables used in query expressions. For instance, consider the expression:

from name in names
let length = name.Length
select new { name, length }

Here, name and length are range variables. In C# in Depth, I claim that range variables aren’t like any other type of variable. Eric commented that they’re not unlike iteration variables in foreach – and since then, I’ve seen both Jamie King/Bruce Eckel’s book, and now C# 3.0 in a Nutshell (which is very good, by the way), using “iteration variable” as the name for range variables. (Update as to the reason for this: they were called iteration variables until a reasonably recent version of the spec, apparently.)

Eric made the point that there are interesting restrictions on the use of iteration variables, just like there are restrictions on the use of range variables. However, that’s not enough of a similarity from my point of view. I think of them very, very differently.

When I declare an iteration variable in a foreach loop, the variable really “exists” in that method. It’s present in the IL for it as a local variable (or maybe a captured one). When I declare a range variable, it’s never a variable in that method – it’s just a parameter name in a lambda expression. It’s the shadow/promise/ghost of a variable – a variable yet to live, one which will only have a value when I start executing the query.

I suppose in that respect it’s closest to the parameter name given in a lambda expression or anonymous method – except that it’s valid for multiple parts of a query expression, carrying the value from place to place, via different “normal” parameters.

So, am I crazy for looking at range variables as a different type of beast, or is the rest of the world “wrong” on this one? :)

PostSharp and iterator blocks – a beautiful combination

March 27, 2008 jonskeet 7 Comments

I’ve blogged before about the issue of validating parameters in methods implemented with iterator blocks. Tonight, I’ve found a nice way of doing it (in common situations, anyway).

This morning I looked into PostSharp for the first time. PostSharp is an assembly post-processor, giving build-time AOP. I haven’t had any experience with AOP on .NET, but execution time AOP has always felt slightly risky to me. I wouldn’t like to say exactly why, and it may well be mere superstition – but I’m much happier with a tool modifying my code statically after a build than something messing around with it later.

So far, I’m very impressed with PostSharp. It’s a doddle to use and understand, and it has some very neat features. I thought I’d give it an interesting test, seeing what it would make of iterator blocks. I’d already seen its OnMethodBoundaryAspect, which is invoked when a method is called and when it exits. PostSharp works on the compiled code, so the method it sees with iterator blocks is the actual GetEnumerable (or whatever) method, rather than the state machine stuff the C# compiler generates to perform the actual iteration. In other words, the point at which PostSharp gets involved is precisely the point at which we want to validate parameters.

So, initially I had to write an aspect to validate parameter values. In order to make life easier for the developer, I’ve allowed the required parameter to be specified by name. I then find out the parameter position at weave time (post main build, pre execution) – the position as an integer is deserialized a single time when the real code is executed, so I don’t need to look it up from the list of parameters. This also means I can validate that the given parameter name is actually correct, and spot the error at weave time.

Here’s the code for the aspect:

using System;
using System.Reflection;
using PostSharp.Laos;

namespace IteratorBlocks
{
    [Serializable]
    class NullArgumentAspect : OnMethodBoundaryAspect
    {
        string name;
        int position;

        public NullArgumentAspect(string name)
        {
            this.name = name;
        }

        public override void CompileTimeInitialize(MethodBase method)
        {
            base.CompileTimeInitialize(method);
            ParameterInfo[] parameters = method.GetParameters();
            for (int index = 0; index < parameters.Length; index++)
            {
                if (parameters[index].Name == name)
                {
                    position = index;
                    return;
                }
            }
            throw new ArgumentException(“No parameter with name “ + name);
        }

        public override void OnEntry(MethodExecutionEventArgs eventArgs)
        {
            if (eventArgs.GetArguments()[position] == null)
            {
                throw new ArgumentNullException(name);
            }
        }
    }
}

Here’s the “production” code I’ve used to test it:

using System;
using System.Collections.Generic;

namespace IteratorBlocks
{
    class Iterators
    {
        public static IEnumerable<char> GetNormalEnumerable(string text)
        {
            if (text == null)
            {
                throw new ArgumentNullException(“text”);
            }
            foreach (char c in text)
            {
                yield return c;
            }
        }

        [NullArgumentAspect(“text”)]
        public static IEnumerable<char> GetAspectEnhancedEnumerable(string text)
        {
            foreach (char c in text)
            {
                yield return c;
            }
        }
    }
}

And here are the actual test cases:

using System;
using NUnit.Framework;

namespace IteratorBlocks
{
    [TestFixture]
    public class Tests
    {
        [Test]
        [ExpectedException(typeof(ArgumentNullException))]
        public void NormalEnumerableWithNullText()
        {
            Iterators.GetNormalEnumerable(null);
        }

        [Test]
        [ExpectedException(typeof(ArgumentNullException))]
        public void AspectEnhancedEnumerableWithNullText()
        {
            Iterators.GetAspectEnhancedEnumerable(null);
        }
    }
}

The NormalEnumerableWithNullText test fails, as I knew it would – but the AspectEnhancedEnumerableWithNullText test passes, showing that the argument validation is eager, despite the actual iteration being lazy. Hooray! Not only that, but I like the declarative nature of the validation check – it leaves the actual implementation of the iterator nice and clean.

The thing which probably impresses me most about PostSharp is how simple it is to use. This code literally worked first time – despite me being a complete novice with PostSharp, having watched the 5 minute video and that being about the limit of my investigation so far. It certainly won’t be the end of my investigation though.

So, why did none of you smart people who have no doubt known about PostSharp for ages actually tell me about it, hmm? I’m always the last to find out…

Book reviews, Books, C#

Book review: Head First C#

March 21, 2008 jonskeet 10 Comments

Important "versioning" note

This review tackles the first printing, from November 2007. Since then, the book has undergone more printings, with errata being fixed in each printing. I believe most or possibly all of the errors listed below are now fixed – although I don’t yet know whether there are more lurking. I have a recent (late 2009) printing which I intend to review when I have the time – which means it’s likely to be in 2010Q2 at the earliest, and more likely later. I don’t know how much editing has been done in terms of best/bad practice. I have left the review as I originally wrote it, as it is a fair (to my mind) representation of that first printing. (This is something to bear in mind with all reviews, mind you – check when they are written, and ideally which version they’re written about.) I’m reluctant to go as far as recommending the latest printing without actually reading it, but I’m fairly confident it’s a lot better than the first printing.

You should also bear in mind that many C# books – including Head First C#, the Essential C# book referenced in the review and my own book – are currently being revised for C# 4. If the HFC# 4 book comes out before I’ve had time to review the latest printing of the previous edition, I will probably go straight for that instead.

Resources

Introduction

This is a tough review to write. We already know I’m biased due to being in some way in competition with Andrew Stellman and Jennifer Greene (the authors), but I’m also not a huge fan of the Head First series in general. It doesn’t coincide with how I like to learn. I feel patronised by the pictures and crosswords rather than drawn into them, etc. However, I’m very aware that it’s really popular – lots of people swear by it, and I know that my own tastes are far from that of the majority. There’s also the fact that Head First C# really isn’t aimed at me at all – it’s aimed at people who don’t know C# to start with.

Funnily enough, that’s why I really, really wanted to like the book. It’s not actually competing with C# in Depth except for readers who have no idea what either book is like. I can’t imagine many people being in the situation where both books would be appropriate for them. I do want to find a good book for someone to learn C# from though, from scratch. My own book is completely unsuitable for that purpose. Essential C# by Mark Michaelis is my current favourite, I think – although I confess to not having read it all. It also doesn’t cover C# 3 – although I’m not sure whether I’d actually want to cover C# 3 for a reader who doesn’t know 1 or 2.

Anyway, I had hoped that Head First C# (HFC# from now on) would become a good recommendation – and part of this is because Andrew and Jennifer have certainly been very friendly on the blog and email. I don’t like being nasty, and I know how I’d feel reading a review like this of my book. Unfortunately, it falls a long way short of being worthy of recommendation, and I can’t really find a way of hiding that.

I "read" the whole book today. Now, it’s over 700 pages and I’ve been at work, so clearly I haven’t read every word of every page. I have mostly skimmed the code examples – vital for learning, I’ll gladly admit, but not hugely necessary for getting the gist of the book. I skimmed over the graphics section particularly quickly, being more interested in the language and core library elements than UI topics. With that in mind, let’s look at what I did and didn’t like:

The Good

This is going to be a broadly negative review, but it’s not like the book is without merit. Here’s what I liked:

Many of the examples were very nicely chosen. I liked the party organizer (chapter 5) and the room topology (chapter 7) ones in particular.
You really do get to create some nifty WinForms applications. I wouldn’t like to claim you really understand everything about them by the end of it, but I can see how they’re engaging.
Annotations in listings are a good thing in general, and I do like the handwritten feel of them.
Q&A is a good idea for a book like this. Predicting natural questions is an important part of
After a slightly scary chapter 1 (building an addressbook using a database without actually knowing any C#) it does start from the real beginning, to some extent.
The pictures in this book were actually not annoying, to my surprise. Some were even endearing. Even to me.
Apparently other people really like this book. It’s got loads of 5-star reviews on Amazon, and apparently it’s been selling fantastically well. From the posts in the book’s forum, people are really getting engaged, which can only be a good thing.

The Bad…

Formatting

I know the Head First series is all about its wacky style, etc – but I still find it can take a while to work out how you’re meant to navigate through the page. Often bits of the page do require a certain order, but that order isn’t always obvious.

My main problem with the formatting wasn’t nearly as general as that though, and I have to admit it’s a bit obsessive-compulsive. It’s the quotes in the code. They’re not straight. I’ve never seen an IDE which tries to work out "curly quotes" in code, and I hope I never do. When you’re used to seeing code in a real IDE, seeing curly quotes just looks wrong. It’s jarring and distracts from the business of learning.

Oh, and K&R bracing sucks. I know it makes the book shorter, but it makes it harder to read too. Just a personal opinion, of course :) (Having said which, the bracing style is inconsistent through the book anyway.)

"Top-down" learning

I’m unashamedly a "bottom-up" person when it comes to learning. I like to get hold of several small building blocks, understand them to a significant degree, and then see what happens when I put them together. This book prefers the "show it all and explain some bits as we go along" approach, reasonably frequently mentioning how awful it is that most technical books try to start with console applications which don’t need lots of libraries before you even know what a library is.

I suspect this top-down way does indeed get people going much more quickly than my preferred style. (Remember, this is my preferred reading style, not just writing style. It’s no coincidence that I happen to write like I read though.) However, I believe there’s a great danger of people ending up as cargo-cult programmers. (I know I refer to that blog entry a lot. It happens to rock, and be relevant to much of why I write how/what I write.)

It’s true that top-down learning gets you doing flashy stuff more quickly than bottom-up. Whether that’s more fun or not depends on what appeals to you – I get a great sense of joy from thinking about a difficult topic and gradually understanding it, even if I have nothing to show for it beyond console apps which do little but sort numbers etc.

It feels like there ought to be a middle way, but I’ve no idea what it is yet. The "show something cool in chapter 1 and then go back over the details" approach favoured by the vast majority of technical books (including mine, to some extent) isn’t what I’m thinking of. Definitely something to think about there.

Incompleteness

I hadn’t expected this book to be in great depth, despite the quote on the back cover: "If you want to learn C# in depth and have fun doing it, this is THE book for you." I’d expected a bit more than this, however.

When I said earlier on that HFC# wasn’t competing with C# in Depth, I mentioned audiences. There’s something to be said about material as well though. Let’s look at the bulk of my book – chapters 3-11, which deal with the new language features from C# 2 and 3. Here’s how much is covered in HFC#:

Generics: generic collections get mentioned, but there’s no real explanation of how you’d write your own generic types. Generic methods aren’t mentioned. Type constraints, default(…), variance (or lack thereof), type inference – all absent.
Nullable types: not mentioned as far as I can see, including an absence of the null-coalescing operator.
Delegates: there’s a chapter on delegates which deals entirely with events and callbacks. No mention of their use in LINQ. No use of method group conversions, anonymous methods or support for variance. (In fact, there are some false claims around that, saying that the signature of an event handler has to match the delegate type exactly.)
Iterator blocks: IEnumerable<T> is mentioned, but IEnumerator<T> is never shown as far as I remember, and iterator blocks certainly aren’t shown. It mentions that you could implement IEnumerable<T> yourself, but doesn’t give any hints as to how, or what would be required.
Partial types: covered, albeit only mentioning classes. No sign of partial methods.
Static classes: covered to some extent, but without explaining that they prevent you from attempting to use the class inappropriately, or other details like the absence of constructors (not even a default one).
Separate getter/setter access: covered
Namespace aliases: not covered
Pragma directives: not covered
Fixed size buffers: not covered
InternalsVisibleTo: not covered (not really a language feature though)
Automatically implemented properties: covered, but without mentioning (AFAICR) that a hidden backing field is generated for you
Implicit typing: one call-out and a somewhat inaccurate Q&A
Object/collection initializers: covered, but without noting that you can remove the brackets from parameterless constructor calls
Implicitly typed arrays: not covered
Anonymous types: mentioned in a single annotation, but inaccurately. Used in a few query expressions.
Lambda expressions: not mentioned (in a book which "covers" C# 3.0. Wow.)
Expression trees: not mentioned
Changes to type inference/overloading: hard to explain a change to something which isn’t covered to start with
Extension methods: covered
Query expressions: covered to a very limited extent. No explanation of query translation. No explanation of query continuations (although grouping is always shown with continuations). No sign of multiple "from" statements or "join into".

See why I don’t think our books are really competing? Now, a lot of this really is absolutely fine. The details of generics don’t really belong in an introductory text – although some more information would have been welcome. Pragmas and fixed size buffers would have been completely out of place. Would it be too much to ask for some discussion of anonymous methods and lambda expressions though, particularly as lambda expressions really do make LINQ possible?

Maybe I’m being too harsh, looking at just C# 2 and 3 features (which is pretty much all my book does). Here are some things which would have counted as being missing had the book been published in 2002:

typeof(…)
Casting ("as" doesn’t count as a cast in my view – it’s an operator)
lock
volatile
Explicit interface implementation
The conditional operator (x ? y : z)
Hiding members with "new" instead of "override"
readonly
The sealed modifier on methods (not just classes)
ref/out/params modifiers for parameters
continue, goto, and break (break is mentioned for switch/case, but only there)

Maybe going into the memory model would have been a bit much (without which volatile would be pretty pointless) but not to even mention threading (and lock) feels a little worrying – especially as Application.DoEvents is abused instead. Explicit interface implementation is relatively obscure – but readonly fields? typeof(…)?

Fine, it’s an introductory text – but that means it’s a bad idea to talk about "mastering" LINQ and claiming that "by the time you’re through you’ll be a proficient C# programmer, designing and coding large-scale applications". Those quotes probably aren’t the authors’ fault, to be honest – and marketing has a way of exaggerating things, as we all know. But no-one should be in any doubt that this is far from a complete guide to C#.

Errors (please see update at the end of the post, and note at top)

This is by far my biggest gripe with the book. It’s probably coloured the whole review – things which I could have forgiven otherwise have been judged more harshly than they would have been if the entire book had been accurate. Accuracy is what I demand from a book above all else, partly because it’s not obvious to the reader when it’s absent. No-one could reasonably read the book without realising that they’re getting a top-down approach, or what the formatting is like, or that there are going to be crosswords etc. However, a reader has little to benchmark accuracy against unless the book is internally consistent.

Now, I’m unfortunate enough to have the first edition (November 2007). The book is currently undergoing its third printing, i.e. it’s had two rounds of corrections. These are listed in the errata and I’ve tried to take them into account – although there’s no way that a carefully reviewed and edited book should need that many corrections in such a short space of time. I will concede that a Head First book is likely to be much harder to get right than a "normal" book though, due to all the funky formatting.

Typos don’t worry me too much. It’s core technical errors which really bother me. It’s almost as if someone had taken my list of classic "myths" and decided to taunt me. I half expected "objects are passed by reference" to be in there – but as ref/out parameters aren’t covered at all, it’s not. Here are some of the worst/most amusing culprits though:

Claiming that string is a value type (in several places). Oh, and object, once.
Claiming that the range of sbyte is -127 to 128 (instead of -128 to 127). Same kind of mistake with short.
Constantly using field and property as if they were interchangable. They’re not, they’re really, really not. Just because they’re used in a similar way doesn’t mean you can be this loose with the terminology.
Claiming that C# "marks objects for garbage collection". In fact, for the first 6/7ths of the book there’s a strong implication that garbage collection is done in a deterministic way; that objects are immediately collected when the last reference is lost. We do eventually find out that it’s non-deterministic (although that explanation is also flawed) but by then it may well be too later for the reader. More on this in a minute.
Claiming that methods and statements always have to live in classes. Funny how structs can have behaviour too…
Claiming that "objects are variables" (in a heading, no less). I know from experience that trying to accurately describe the interplay between objects, variables, and their values is tricky – but even so…
Writing a hex dump utility using StreamReader – broken by definition, given that hex dump tools are used to show the binary contents of files, and StreamReader is meant to read text, decoding it as it goes.
Claiming that structs always live on the stack.
Expanding WPF as "Windows Presentation Framework"

These are all errors which have made it through not just technical review, but two rounds of post-publication editing. It’s possible that some of the errors on my list of about 60 (ignoring typos for the most part) are in the errata and I missed them (I did try to check them all) – but really, I shouldn’t have been able to find that many in the first place, even if they have been corrected. I’m worried if a C# book author believes that a char has 256 possible values, for instance. I know that the author is wrong, and to check the errata (this one has indeed been fixed) but I suspect many first edition readers will never look at the errata.

Now, there are errors and there are bad practices…

Bad practice through example

I know we don’t end up writing production code as book examples. Indeed, Eric mentioned this in chapter 1, where I’d left a few things out which I would normally consider as best practice: making a type sealed, making fields readonly etc. I left extra modifiers out for simplicity. I can understand that. I can also understand using public fields until properties have been explained but:

There’s no reason to use poorly named variables/parameters, including Pascal-cased local variables
Writing loops like for (int i=1; i <= 10; i++) instead of the more idiomatic for (int i=0; i < 10; i++). C# is 0-based in many ways, but often the authors seemed to really wish it were 1-based.
Continuing to use public fields even after explaining how they’re not really a good idea. (Well, sort of explaining that. There’s a frequent implication that they’re not so bad if other classes really need to be able to access your data. It’s a very long way from my preferred policy of no non-private variables whatsoever.)
String concatenation in loops with nary a mention of StringBuilder
Bizarre combination of "is" and "as", using "as" to perform the conversion instead of casting. If you’re going to use "as", do it up front and compare with null to start with…
Advising leaving out braces for single line for/if statements. The code which is left in these examples is unreadable, IMO. You have to really concentrate to see what’s in and what’s out.
Advising to stick with the absolutely abhorrent "convention" (aka laziness, and leaving things as VS creates them) of naming event handlers with things like button1_Click. No. Name methods with what they do, then the event hookup code will make it obvious – and it makes it clearer where you can reuse a single method for multiple events.
Repeatedly declaring enums within classes as if you couldn’t write them as top-level types. Oh, and messing up the naming conventions there, too.
Showing a mutable struct without explaining that this should always be avoided is a bad idea.

I could go on. I’ve got pages of notes about this kind of thing (and this is only after owning the book for less than a day, don’t forget) but I think you get the message. Note: some of these things are definitely a matter of opinion, such as bracing style. Some other things are so widely regarded as a bad idea that I can’t see much defence for them.

Examples matter. I don’t expect to see production code, and I understand that sometimes for teaching purposes best practices will take second place – but where it wouldn’t hurt to use best practice, please do!

Likewise telling the truth from the start matters. It’s very hard to correct bad habits and incorrect impressions. If you state that "When we set lucky (a variable) to null, it’s no longer pointing at its object, so it gets garbage collected" then people will not only be potentially confused about whether it’s the variable or the object which gets garbage collected, but they’ll get the impression that it’s garbage collected immediately. Waiting 476 pages to correct that impression is a bad idea.

Public fields are another example. I’ve mentioned that I can see their usefulness before we’ve encountered properties – but even so, surely it would have been worth explaining immediately that we’re going to hide them as soon as possible.

Conclusion

You may have gathered by now that I’m not a fan of the book ;) I suspect a lot of this review has come off as a rant, which is a pity. That tends to happen when I get on my technical high horse, which I guess I’ve done here. The fact is, I’ve spent a lot of today feeling deeply saddened. I wouldn’t be surprised to find that this is the best-selling C# book of 2008 – which means we’ll get a lot more people on the newsgroup with some very odd ideas about how C# works. That’s the trouble – I’ve seen what happens when people are fed the "structs live on the stack" myth. I’ve seen how easily people can believe that strings are value types, and that it doesn’t really matter if you use a StreamReader for binary data and then cast chars to bytes. It causes trouble.

I’m reasonably sure the world could do with a good introductory book on C#. HFC# has convinced me that such a book could be fun and have pretty pictures. But HFC# isn’t quite it. (And no, I don’t plan on writing it either.)

Feedback

I gave an advance copy of this review to Andrew, who has replied remarkably politely and pleasantly (and quickly). He’s a true gent, giving a really thorough reply when I suspect most authors (perhaps including myself) would either have given short shrift to a review like this, or possibly ignored it competely and hoped it would go away.

He believes I was looking too much for a complete reference rather than an introductory text. I would say I wasn’t looking for completeness, but a better judge of what should be in a C# book (I’d have preferred ref/out to be covered, but would be happy to lose the section on GDI+ double buffering, for instance). I specifically don’t want an actual reference book if I’m going to recommend it to people to learn from – but I may be biased towards a reference style as that’s what I personally tend to learn from.

It was really the errors that affected this review more than anything though, and while a very few of them could be debated (whether an implicit reference conversion to a base class counts as upcasting, for instance – I’d only include explicit upcasting) many others are undeniable and really shouldn’t have made it through the review process. It’s possible that I’ll come back to HFC# in a year’s time and be more impressed by it, but I suspect the aversion to errors will overcome any mellowing towards the style :(

Update (22nd March 2008)

Andrew continues to amaze me in terms of taking this review in his stride. He’s now looking through the errors I found, and many should be fixed in the next printing. Bear in mind that without a lot of the errors, I would have had a more positive view from the start.

In short, I’m still not quite convinced I’d recommend the book, but my opinion has certainly mellowed (anticipating the error fixing, of course).

Book reviews, Books, C#

Book review/preview: “C# Query Expressions And Supporting Features in C# 3.0” (Eckel/King)

March 16, 2008 jonskeet 4 Comments

Introduction

Let me make one thing very clear before anything else: this is a preview. Bruce Eckel has made the preview of what appears to be part of a bigger book available free from his website. The book is by Bruce Eckel and Jamie King, and the preview available (1.0 at the time of writing) covers the following topics:

Extension methods
Implicitly typed local variables
Automatic properties
Implicitly-typed arrays
Object initializers
Collection initializers
Anonymous types
Lambda expressions
Query expression translation

For obvious reasons, this had me slightly worried when I first looked at it – it’s clearly a reasonably direct competitor to C# in Depth. There’s not a lot of C# 3 which isn’t covered here. I can only think of these things off-hand:

Expression trees
Object initializers setting properties of embedded objects
New type inference rules
New overload resolution rules

I was surprised to see expression trees not get even a mention. I’m sure they’ll be covered elsewhere in the full book, but I personally think it’s worth introducing them at the around same time as lambda expressions. (It would be odd for me to have any other view, given the location in C# in Depth!) I don’t know whether the new rules for type inference and overload resolution will be covered elsewhere. If they’re going to be covered but the authors haven’t done the writing yet, we should all feel sympathy for them. That section (9.4) was the hardest one in the whole book for me. It may be possible to describe all of the rules in a way which doesn’t make both reader and writer want to tear their hair out, but I have yet to see it.

I don’t know exactly what the bigger book will cover, or how it will be published, or whether it will be available in preview form, etc. From here on in, when I say “the book” I mean “the preview bit”.

Format

After the TOC and introductory material, the book basically consists of 4 things:

Headings (occasional)
Explanatory text
Code
Exercises

There are no diagrams or tables as far as I can see. The main body of the book (P7-137) sets exercises quite frequently (there are 54 of them), and the answers form P138-233, including brief explanations.

The code is always complete, including using directives, a Main method (for non-library classes) and output. A lot of the time the code essentially forms unit tests to demonstrate the features (a technique we used in Groovy in Action, by the way). The authors have their own build system which not only runs the tests, but also allows comments to express pieces of code which shouldn’t compile. The output is also checked, i.e. that running the code produces the output in the book.

There are pros and cons to this approach. For those who haven’t read any of my book (what are you waiting for? The first chapter is free!) I personally use a tool I wrote called “Snippy” which allows me to include short but complete examples without using directives and Main declarations appearing everywhere. My comments on this book’s approach:

I’d be surprised to see any misleading/broken examples. (There’s one piece of code which doesn’t go quite as far as I believe it should, but that’s a different matter.)
It encourages unit testing.
It leads to longer code with repetition (Main etc).
The build system leads to non-standard comments, like //c! to indicate a non-compiling line and //e! to indicate where an exception should be thrown.
There’s a little too much text dealing with the build system – it’s distracting

The “long code” issue has been dealt with by squeezing a lot of code into short spaces – K&R bracing and very little whitespace. Personally I find this really quite tricky to read, to the extent that I ended up skipping a lot of the code examples. I’ve tried to keep all my code examples pretty short, and none of them are over a page. (That was an unstated goal at the start of the project, in fact.)

Now, how much code do you like to see in a book? That’s very much a personal decision. I happen to like quite a lot of explanatory text – so that’s how I write, too. In this book I reckon (and it’s only a complete guess) about 50% of the book is code, 35% is prose and 15% is exercise. This quite possibly pose a challenge for me as a reader if I didn’t already know the topic. However, for other readers it’s probably spot on.

What this book doesn’t have (fortunately) is lots of examples which go on for pages and pages, producing a complete application with little explanation. I’ve seen that too often – and a lot of the code simply doesn’t teach me anything. I’ve never particularly liked the “build a complete application” approach to books, partly because it doesn’t actually mean that all the bases are covered (you don’t see every issue in every app) and it does mean there’s a lot of turn-the-handle code which isn’t relevant to the topic being taught. It can be a useful technique in some situations, but I like it when irrelevant code is omitted (and is just available for download).

The other personal question is whether or not you like exercises. I certainly believe in trying out new things as you read about them, but exercises don’t really fill that need in an ideal way for me. I like to try to apply a new technique to an existing bit of code, or an existing database for instance – and obviously the author has no way of knowing that. Now, that only says something about me, not about the value of exercises. This book has been used for teaching in a university, and I suspect the exercises have been appropriate in that setting. Note for future consideration (and reader feedback): should I include exercises in any future books I might write? Should I create some for the C# in Depth web site?

Style

(Some of this might reasonably count as format as well – it’s a blurry line.)

I have a consciously “light” style. I write in the first person and try to include opinion and the occasional joke or at least lighthearted comment. (Footnote 1 in chapter 3 is my favourite, for reference.)

Eckel and King’s book is more like a textbook. The authors haven’t allowed their personalities to come through in the text at all – and it’s clearly a deliberate decision. Good or bad? Hard to say – it depends on the context. The word “textbook” is the key here, for me – I can’t remember textbooks having any personality when I was a student, so if they’re going for that market it’s spot on. In the “professional developer” market it may have a harder time. Again, personally I’m a fan of a bit of personality peeping through the text – although it has to be firmly controlled, and it’s better to err on the side of caution. I’ve read some books which seem to be all about the author’s personality, without letting the subject matter have a look-in.

I do think more headings (of varying sizes, if you see what I mean) and the occasional diagram would be helpful, though. It’s a pretty unrelenting code-text-code-text-exercise-code-text-code-text mix. The code is all just “there” with no headings, nothing to visually break things up. (It’s not actually run-on with the text – it’s clear where text stops and code starts, and there’s even a helpful vertical line down the side of the code – it’s just that there’s nothing to make you take a mental breath.) This could be due to it being a preview – it’s possible that more formatting will occur later on. If that wasn’t the plan, I’d encourage the authors to at least consider it. (Wow, see how easy it is to slip into arrogance? Must make a memo to give Joel Spolsky some notes on writing later ;)

Content

The content is pretty full-on, and very language-focused. As an example, I suspect few books on C# 3 will go into any detail about transparent identifiers in query expressions. In my book I explain them for one particular clause (“let”) and then just mention when the compiler will introduce one for other clauses. Eckel and King’s book gives full exampes of translation for all the clauses available, as far as I can tell.

That’s just an example – and possibly an extreme one – but this book does go into a reasonable amount of depth when it comes to the facts. (There were also two items I wasn’t aware of: the option of explicitly stating that an ordering is ascending, and the ability to create “extension delegates“. They’re not huge omissions in my text (and at least I’ve now got notes for them), but the fact that I missed them and these guys didn’t is (to me) an indication of their thoroughness.

Now, having dealt with the plain facts, there’s not a lot of opinion in the book – pieces of text which encourage the reader to think about why C# has changed the way it has, or the best way to take advantage of those changes. Again, this is a valid approach for a textbook – especially one used in conjunction with a course where the lecturer can talk about these things – but I suspect the non-academic market likes guidance.

The accuracy level seemed pretty high to me. Not perfect, but then I don’t expect mine is either, even with Eric’s thorough eye. In everyone’s interests, I’ve mailed the authors my specific comments and nitpicks – as the book is still at a preview stage, corrections can be made relatively easily, I expect.

Conclusion

Obviously I can only comment on the book as I’ve seen it so far – I’ve no idea whether the other chapters will be more framework-focused. However, it’s good to see another book that tries to “go deep” like mine does. While this clearly makes it competition in many ways, I think we’re aiming at different audiences. If I’m right in my assumption that this is trying to be a textbook, there may be little overlap in potential market. (I suspect the same will be true of Head First C#, which is likely to be my next review – but for the opposite reason. I suspect I’ll find that HFC# is more aimed at beginners – something that certainly couldn’t be said of this book or mine.)

Overall this is a very solid text, in many senses. It’s not the easiest book to follow due to its style, but it’s detailed and accurate. Given a choice between the latter and the former, I’d always choose the latter for anything I’d want to refer back to – and this book certainly counts as a good reference for query expressions. Obviously I’m hoping people find my style appealing and that I’m detailed and accurate, but I can’t give that judgement.

As a final word – if you haven’t downloaded it yet, why not? It’s a totally free download of only just over a meg. I don’t think I even had to register anywhere to get it. Reading other work is useful for me as a writer, but there’s no need for you to trust my judgement, nor indeed would it be wise to do so. If you missed it before, I’ll even save you scrolling up for the download link.

I’d be interested to hear whether your opinions coincide with mine. If you’ve read my book and can compare and contrast, so much the better. I’ve let the authors know that this review is coming, so I suspect they’ll be checking here for feedback. (They’d be foolish not to, and I have no reason to believe they’re fools.)

C#, C# 4

C# 4: Immutable type initialization

March 15, 2008 jonskeet 17 Comments

(I’m giving up with the numbering now, unless anyone particularly wants me to keep it up. What was originally going to be a limited series appears to be growing without end…)

As Chris Nahr pointed out in my previous post, my earlier idea about staged initialization was very half-baked. As he’s prompted me to think further about it, I’ve come up with another idea. It’s slightly more baked, although there are lots of different possibilities and open questions.

Let’s take a step back, and look at my motivation: I like immutable types. They’re handy when it comes to thread safety, and they make it a lot easier to reason about the world when you know that nothing can change a certain value after it’s been created. Now, the issues are:

We really want to be able to fully construct the object in the constructor. That means we can mark all fields as initonly in the generated IL, potentially giving the CLR more scope for optimisation.
When setting more than two or three values (while allowing some to be optional) constructor overloading ends up being a pain.
Object initializers in C# 3 only apply to properties and fields, not method/constructor arguments – so we can’t get the clarity of naming.
Ideally we want to support validation (or possibly other code) and automatic properties.
The CLR won’t allow initonly fields being set anywhere other than in the constructor – so even if we made sure we didn’t call any setters other than in the constructor, we still couldn’t use them to set the fields.
We want to allow simple construction of immutable types from code other than C#. In particular, I care about being able to use projects like Spring.NET and Castle/Windsor (potentially after changes to those projects) to easily create instances of immutable types without resorting to looking up the order of constructor parameters.

The core of the proposal is to be able to mark properties as initonly, and get the compiler to create an extra type which is thoroughly mutable, and contains those properties – as well as a constructor which accepts an instance of the extra type and uses it to populate the immutable instance of the main type before returning.

Extra syntax could then be used to call this constructor – or indeed, given that the properties are actually readonly, thus avoiding any ambiguity, normal object initializers could be used to create instances.

Just as an example, imagine this code:

public class Address
{
    public string Line1 { get; initonly set; }
    public string Line2 { get; initonly set; }
    public string Line3 { get; initonly set; }
    public string County { get; initonly set; }
    public string State { get; initonly set; }
    public string Country { get; initonly set; }
    public string ZipCode { get; initonly set; }

    // Business methods as normal
}

// In another class
Address addr = new Address
{
    Line1=“10 Fairview Avenue”,
    Line3=“Makebelieve Town”,
    County=“Mono County”,
    State=“California”,
    Country=“US”
};

That could be transformed into code a bit like this:

// Immutable class

// Let tools (e.g. the compiler!) know how we
// expect to be initialized. Could be specified
// manually to avoid using the default class name
[InitializedWith(typeof(Address.Init))]
public class Address
{
    // Nested mutable class used for initialization
    [CompilerGenerated]
    public class Init
    {
        public string Line1 { get; set; }
        public string Line2 { get; set; }
        public string Line3 { get; set; }
        public string County { get; set; }
        public string State { get; set; }
        public string Country { get; set; }
        public string ZipCode { get; initonly set; }
    }

    // Read-only “real” properties, automatically
    // implemented and backed with initonly fields
    public string Line1 { get; }
    public string Line2 { get; }
    public string Line3 { get; }
    public string County { get; }
    public string State { get; }
    public string Country { get; }
    public string ZipCode { get; }

    // Automatically generated constructor, using
    // backing fields directly
    public Address(Address.Init init)
    {
        <>_line1 = init.Line1;
        <>_line2 = init.Line2;
        <>_line3 = init.Line3;
        <>_county = init.County;
        <>_state = init.State;
        <>_country = init.Country;
        <>_zipCode = init.ZipCode;
    }

// Business methods as normal
}

// In another class
Address addr = new Address(new Address.Init
{
    Line1=“10 Fairview Avenue”,
    Line3=“Makebelieve Town”,
    County=“Mono County”,
    State=“California”,
    Country=“US”
});

That’s the simple case, of course. Issues:

Unlike other compiler-generated types (anonymous types, types for iterator blocks, types for anonymous functions) we do want this to be public, and have a name which can be used elsewhere. We need to find some way of making sure it doesn’t clash with other names. In the example above, I’ve used an attribute to indicate which type is used for initialization – I could imagine some way of doing this in the “pre-transform” code to say what the auto-generated type should be called.
What happens if you put code in the setter, instead of making it automatically implemented? I suspect that code should be moved into the setter of the initialization class – but at that point it won’t have access to the rest of the state of the class (beyond the other properties in the initialization class). It’s somewhat messy.
What if you want to add code to the generated constructor? (Possibly solution: allow constructors to be marked somehow in a way that means “add on the initialization class as a parameter at the end, and copy all the values as a first step.)
How can you indicate that some parameters are mandatory, and some are optional? (The mandatory parameters could just be marked as readonly properties rather than initonly, and then the initialization class specified as an extra parameter for a constructor which takes all the mandatory ones. Doesn’t feel elegant though, and leaves you with two different types of initialization code being mixed in the client – some named, some positional.)
How do you specify default values? (They probably end up being the default values of the automatically generated properties of the initialization class, but there needs to be some syntax to specify them.)

I suspect there are more issues too – but I think the benefits would be great. I know the C# team has been thinking about immutability, but I’ve no idea what kind of support they’re currently envisioning. Unlike my previous ideas, which were indeed unpalatable for various reasons, I think this one has real potential. Mind you, given that I’ve come up with it after only mulling this over in “spare” time, I highly doubt that it will be a new concept to the team…

Books

Reviewing other C# books?

March 14, 2008 jonskeet 9 Comments

Just a quick question, really – I’d really like feedback to this one.

This morning I was reading Charlie Calvert’s blog, and saw a link to the preview of a C# 3 book by Bruce Eckel and Jamie King. I’ve downloaded it, and had a look – naturally interested in the competition (and with plenty of evidence that I’ve already finished my book and won’t be plagiarising!). At the same time, I’m also interested in some other C# books which are coming out or are already out – particularly Head First C#.

My question is – would my views on other C# books be interesting to you, dear readers? Obviously I’d have a somewhat different perspective on the matter to other people – but at the same time I think it’s clear that I’ll be somewhat biased. Any such reviews are bound to contain comparisons to my own way of approaching writing about C#. That could either be interesting, or it could be really annoying.

Thoughts welcome. Oh, and if by any chance any of the authors of other C# books are reading this post (unlikely, but hey…) – I’d love to hear your views on my book, whatever they happen to be.

Reading Geek Nights

Reading Geek Night 2 – March 28th

March 13, 2008 jonskeet 1 Comment

I’ve mentioned “Reading Geek Night” before – it’s basically a loosely connected bunch of folks talking about fun stuff. There’s a mix of agile/not, .NET/not, expert/not, egomaniac/not, sane/not etc.

The next meeting will be on March 28th, at my house (unless this post draws vast numbers of new folks!). We’ll start off by talking about C#, with me basically doing a demo of chapter 1 of my book – evolving code from C# 1 to C# 3 via C# 2. From there I expect it’s likely we’ll look at LINQ in more depth, maybe iterators, maybe think up funky new stuff. Basically a good time should be had by all.

If you’re around the Reading (UK) area and you’re interested in coming, please let me know. Anyone who has already mailed me about this but has managed to avoid being on the mailing list somehow, let me know that, too!

Uncategorized

C# 4, part 5: Other bits and bobs which probably don’t merit inclusion

March 11, 2008 jonskeet 22 Comments

Okay, I know I said that part 4 would be the last part in this series… but since then I’ve not only thought about iterator block parameter checking, but a few other things. Some of these I simply forgot about before, and some I hadn’t thought of yet. I’m not sure any of these are actually worthy of inclusion, but they may provoke further thought.

Tuple returns

I’ve been reading Programming Erlang and I suspect that being able to return tuples (i.e. multiple values, strongly typed but without an overall predefined type) would be a good thing. For instance, in a tuple-returning world, int.TryParse could be redesigned to return both the true/false and the parsed value. It could have a signature like this:

public static (int, bool) TryParse(string text)

… and then be called like this:

int value;
bool parsed;

(value, parsed) = int.TryParse(“Foo”);

Now, a few things to work out:

How do we ignore values we’re not interested in?

Part of the problem with out parameters is that sometimes you don’t actually care about the value – but you still have to declare and pass in a parameter. Suppose we could use ? as a placeholder for “I don’t care”. (This is _ in Erlang pattern matching, IIRC. Same kind of business.)

What could you do with a tuple?

We could potentially make tuples first class citizens, so that you could declare variables of that type, a bit like anonymous types, but with anonymous property names as well, used just for matching later. Or we could force matching at the point of method call, which would restrict the use a bit further but leave less other rules to be worked out.

Either way, I’d hope to be able to set either fields or properties by parameter matching.

What’s the value of the overall expression?

This really depends on the answer to the previous question. If tuples are first class types, then the result of the expression would normally be the tuple itself. However, I wonder whether there’s more that can be done. For instance, thinking about our TryParse example, it’s useful to be able to write (currently):

if (int.TryParse(“Foo”, out value))
{
…
}

Suppose we were able to designate one of the matched elements of the tuple to be the expression result, e.g. using _ to be slightly Perl-like:

if ((value, _) = int.TryParse(“Foo”))
{
…
}

Would that be worth doing?

More information required…

I suspect that people who know more about the use of tuples in other languages would be able to say more about this. Some overlap with anonymous types is clearly relevant too, and would need to be carefully considered. I’m not wedded to any of the syntax shown above, of course – I’m just interested in how/where it could be useful.

Named method/constructor arguments

One of the features I like about F# is that you can specify the names of arguments, without worrying about the order. This means that it becomes even more important to name methods appropriately, but it would make method calls with many parameters simpler to read. Currently it’s common practice to use one parameter per line and a comment to indicate the use, e.g.

foo.Complicated(10,        // Number of elements to return
                “bar”,     // Name of collection
                x => x+1, // Step for element
                3.5        // Load factor
               );

In fact, this example is relatively simple because all the parameter types are different – look at the more complicated overloads of Enumerable.GroupBy for rather more hellish examples. It’s incredibly ugly, and the compiler isn’t able to check anything. Now suppose we could instead write:

foo.Complicated(maxElements = 10,
                collectionName = “bar”,
                step = x => x+1,
                load = 3.5);

Personally I think that’s clearer and less error-prone. The arguments could be reordered with few issues, and the compiler could check that we really were using the right parameter names. One potential issue is in terms of side-effects, where evaluating one argument had a side-effect which affected the evaluation of another argument. At that point reordering is a breaking change. I suspect the compiler would need to stick to the specified textual order, and then rework things on the stack as required to get the appropriate order for the method call. A bit nasty.

Event handler subscription in object initializers

I only thought of this one today, when coming up with an example for a screencast on object initializers. I suspect most uses of object initializers will be to with custom classes (although I recently used them for XmlWriterSettings to great effect) which would make the screencast harder to understand. I was wondering what common framework classes had lots of writable properties, and I hit on the idea of building a UI. It shouldn’t surprise me that this works quite nicely, but you can build up a hierarchical UI quite pleasantly. For example:

Form form = new Form
{
    Size = new Size(300, 300),
    Controls =
    {
        new Button
        {
            Location = new Point(10, 10),
            Text = “Hello”,
        },
        new ListBox
        {
            Location = new Point(10, 50),
            Items =
            {
                “First”,
                “Second”,
                “Third”
            }
        }
    }
};
Application.Run(form);

This is somewhat reminiscent of Groovy builders (and no doubt many other things, of course). However, one thing you can’t currently do is attach an event handler in an object initializer. The obvious syntax would be something like:

new Button
{
    Location = new Point(10, 10),
    Text = “Hello”,
    Click += (sender, args) => Save()
}

where I happen to have used a lambda expression, but didn’t need to – a normal method group conversion or any other way of constructing a delegate would have done just as well.

I mailed the C# team about this, and although it’s been considered before it’s really not useful in many situations. However, the syntax has been left open – there’s no other use of += within object initializers, so it could always be revisited if someone comes up with a killer pattern.

Immutable object initialization

I’ve been thinking about this partly as a result of object initialization in general, and the previous point about named arguments. As has been noted before, C# doesn’t really help you to build immutable objects – either as from the point of view of building the type, or then instantiating it. Basically you’ve got the constructor call, and that’s it. A static method could set private properties and then return the object for popsicle immutability, but it still feels slightly grim.

Someone (possibly Marc Gravell – not sure) suggested to me that there ought to be some way of indicating when an object initializer had finished. At the time I think I rejected the idea, but now I like it. There’s already the ISupportInitialize interface, but that feels slightly too heavy to me – in particular, it has two methods rather than just one. What I think could be nice would be:

A new interface with a single CompleteInitialization method.
Readonly automatic properties which would either make the property only writable during a constructor call if the new interface weren’t implemented or would insert an execution-time check that CompleteInitialization hadn’t been called already.
I’d anticipate the C# compiler implementing the new interface itself automatically in some way which supported inheritance reasonably, unless specifically implemented by the developer.
Members other than constructors couldn’t set readonly automatic properties on this, to avoid accidents.
The CLR should have some interaction so it knew which fields it could treat as being readonly after initialization had been completed.
Object initializers would call CompleteInitialization automatically at the end of the block.

It’s a bit messy, and I’m sure I haven’t thought of everything – but I suspect something along these lines would be a good idea at some point. It’s reminiscent of an earlier wacky idea I had which went further, but this would be specifically to support immutability. Without it, complex immutable types end up with nightmarish constructor calls.

Conclusion

So there we have it – some relatively half-baked ideas which will hopefully provoke a bit more thought – both from readers and myself. It’s interesting to note that aside from event subscription, they all have a fair number of questions and complexity around them, which is off-putting to start with. I would feel more comfortable about event subscription being added than any of the others, because it’s relatively simple and independent. The others feel like more dangerous features – even if they’re more useful too.

Uncategorized

The value of a language specification

March 4, 2008 jonskeet 1 Comment

Last Friday evening was the inaugural Reading Geek Night. More about that another time, but if you’re in or around Reading in the UK, and fancy meeting up with some smart people (and me) to discuss software in various shapes and forms, let me know.

After most people had gone home, a few of us including Stuart Caborn were talking about specs. Stuart remembers how I ended up writing some annotations in the C# Annotated Standard: we were debugging some code, and I noticed an unboxing conversion which was unboxing an enum as an int. It worked, but I was surprised. I consulted the spec, and found that according to the spec it really shouldn’t have worked. (Furthermore, the spec suggested a case which couldn’t possibly be valid. I can’t remember the details now, but I can dig them up if anyone was interested.) I’d had one or two conversations with Jon Jagger (the C# ECMA Task Group convenor at the time) before, so I mailed him. Jon invited me to join in the book project, and I took to it with gusto. I reading most of C# 2 ECMA spec over the course of a few weeks, writing annotations as I went along.

This is not what most people would consider normal behaviour. When I recently gave a talk about C# 3, I was delighted to hear someone else mention that they had checked the spec about some aspect of the language. Finally, I wasn’t alone! However, such people are clearly the exception rather than the rule.

I genuinely don’t think that matters too much. I really don’t expect many developers to read the spec – certainly not thoroughly. I think it’s important to know that there is a spec, and be able to consult it when in doubt. I want to be able to know what every line of code is doing, in terms of which variable it’s going to access, which method it’s going to call, the order of execution of a post-increment as a method argument, etc.

That’s not to say I actually learn all of the rules by rote – even for something as simple as operator precedence, I sometimes put brackets in when they’re not required, for example. I’d rather not rely on me or a maintenance engineer having to remember too many details. But if someone else has written some obscure line of code, I’m pretty confident that I’ll be able to understand it with the help of the spec, and refactor it into something more readable.

Now, Stuart challenged the value of the spec. If his code was misbehaving he wouldn’t consult the spec – he’d consult either books or (more likely) the unit tests. Realistically, the vast majority of C# is being compiled by the Microsoft compiler, so the idea of having a spec available for other implementations isn’t actually important to that many developers in terms of business. (It may be psychologically and politically important, and I’m not trying to knock the great work that the Mono project has done – but I’ve never used Mono professionally, and I suspect that’s the case for most people.) Either the code works or it doesn’t, and if it doesn’t work the tests should say so.

I counter that not having a spec is like not having documentation for a library – if you start relying on unspecified behaviour, you can come unstuck when that behaviour changes in a legitimate way. A good example of this is depending on a particular hash algorithm being used for GetHashCode; the algorithm for string.GetHashCode() changed between .NET 1.1 and 2.0, and I’ve seen a few people get burned, having stored the generated hash values in a database. Suddenly nothing matches any more… because they ignored what the documentation said.

Stuart’s response: if the tests still work, the changes haven’t broken anything. If the tests don’t work, we can go and fix the code so they start to work again. I’ll concede that it’s unlikely that implementation changes in the compiler will actually break any code (and it’s also very unlikely that specification changes will break code – the C# design team are pretty fanatical about not introducing breaking changes).

I can see Stuart’s point of view, but it just feels so very wrong. I suspect a lot of that is down to my personality type – how I really hate working without enough information (or what I consider to be enough information). Today I fixed a bug with an ASP.NET application which was producing incorrect JavaScript. It was working on some machines and not working on others. I thought I’d found out why (a different version of a library in the GAC) but that was ruled out after examining another machine which was working contrary to my hypothesis. I’m reasonably confident that my fix will work, but I really don’t like the fact that I don’t understand the issue in the first place. It’s very hard to piece together the necessary information – which is like working on a language that doesn’t have a spec.

Eventually, I came up with an answer which I think Stuart more or less accepted. I’m in one of the groups the spec is aimed at. I write about C#, hoping to explain it to other people. One of my aims with C# in Depth is to give enough information to make the spec even more irrelevant to most developers when it comes to the changes in C# 2 and 3. Without wishing to denigrate existing C# books too much, I’ve often found that the kind of details which I wanted to investigate further just weren’t covered in the books – to get the answer, I had to go to the spec. I really hope that if I’d had my own book, I’d have been able to consult that for most of those issues. However, I simply couldn’t have written the book without the spec.

I’ve had experience of writing about a language without a spec. When I was helping out with Groovy in Action, I often found myself frustrated by the fact that the Groovy spec is far from finished. This shouldn’t be surprising to anyone – Microsoft have a significant team of really smart people who are paid to immerse themselves thoroughly in C# and make sure the language is all it can be, in terms of design, documentation and implementation. Designing a language well is hard – I haven’t been part of designing any languages, but I can get some idea of the difficulty based on what I’ve seen of the languages I’ve used. The loving care required to make sure that all the behaviour that should be pinned down is indeed described, while leaving rigidly defined areas of doubt and uncertainty where that’s appropriate, must be phenomenal. I don’t doubt that the Groovy team is talented, but coming up with a good spec is probably too much of a resource drain, unfortunately.

I haven’t covered everything I feel about specs in this post, but I’m going to finish now before I officially begin to ramble. Apologies to Stuart if I’ve misrepresented his views – and I should point out that this was late at night after Stuart had a few beers, which may be relevant. In short then (and including points I haven’t gone into):

The existence of a specification is important, even if it’s not consulted by every developer. Even if I were never ill, I’d be glad that the National Health Service existed.
I’d be very worried if the language/compiler team itself didn’t have a good spec, and if they do, there’s no reason to hide it. As an example of how important this is, just read Martin Fowler writing about JRuby/IronRuby: “Soon-to-be-ThoughtWorker Ola Bini, a JRuby committer, reckons that it’s almost impossible to figure out how to implement a Ruby runtime without looking at source code of the MRI.” That screams to me of “the implementation is the documentation” which I regard as a very unhealthy state of affairs.
Specifications are vital for authors (whether of books or web articles) who need to present accurate information based on more than just the current behaviour.
Sometimes you can trust a specification more than tests – with a memory model spec, it’s possible to reason about whether or not my code is thread-safe. It could pass all tests but still not handle a bizarre race condition. (Of course, a better memory model spec for .NET would be welcome.)
Unit tests are never going to catch every flaw. They can give you a great deal of confidence, but not certainty. (Example: how many people explicitly check that their text handling code will work just as well when provided with non-interned strings, rather than strings which were originally specified as literals? If the string interning behaviour changes in a valid way, are you absolutely sure your code won’t fail?)
I’m on the fence about the value of having the ECMA spec as well as the Microsoft one. I can see how it could be important in certain business situations – but as a developer, I don’t care that much. I’ve had very few qualms about changing my standard reference from ECMA C# 2 to MS C# 3. It’s unclear to me (as someone completely outside the process) how much influence ECMA has at this stage on the design of the language itself. Were ECMA committee members explicitly consulted during the C# 3 design process? Clearly making a significant change to the language now would be likely to make all existing compilers “broken” – so what can the ECMA team do beyond reframing the existing rules? As I say, I’m an outside in this matter, so I can’t really judge – but I think it’s a valid question to ask.

Anyway, that’s about a sermon’s-worth of preaching about specifications – time for bed.