Category Archives: Books

Why boxing doesn’t keep me awake at nights

October 8, 2008 jonskeet 26 Comments

I’m currently reading the (generally excellent) CLR via C#, and I’ve recently hit the section on boxing. Why is it that authors feel they have to scaremonger about the effects boxing can have on performance?

Here’s a piece of code from the book:

using System;

public sealed class Program {
public static void Main() {
Int32 v = 5; // Create an unboxed value type variable.

#if INEFFICIENT
      // When compiling the following line, v is boxed
      // three times, wasting time and memory
      Console.WriteLine(“{0}, {1}, {2}”, v, v, v);
#else
      // The lines below have the same result, execute
      // much faster, and use less memory
      Object o = v;

      // No boxing occurs to compile the following line.
      Console.WriteLine(“{0}, {1}, {2}”, o, o, o);
#endif
   }
}

In the text afterwards, he reiterates the point:

This second version executes much faster and allocates less memory from the heap.

This seemed like an overstatement to me, so I thought I’d try it out. Here’s my test application:

using System;
using System.Diagnostics;

public class Test
{
    const int Iterations = 10000000;

    public static void Main()
    {
        Stopwatch sw = Stopwatch.StartNew();
        for (int i=0; i < Iterations; i++)
        {
#if CONSOLE_WITH_BOXING
            Console.WriteLine(“{0} {1} {2}”, i, i, i);
#elif CONSOLE_NO_BOXING
            object o = i;
            Console.WriteLine(“{0} {1} {2}”, o, o, o);
#elif CONSOLE_STRINGS
            string s = i.ToString();
            Console.WriteLine(“{0} {1} {2}”, s, s, s);
#elif FORMAT_WITH_BOXING
            string.Format(“{0} {1} {2}”, i, i, i);
#elif FORMAT_NO_BOXING
            object o = i;
            string.Format(“{0} {1} {2}”, o, o, o);
#elif FORMAT_STRINGS
            string s = i.ToString();
            string.Format(“{0} {1} {2}”, s, s, s);
#elif CONCAT_WITH_BOXING
            string.Concat(i, ” “, i, ” “, i);
#elif CONCAT_NO_BOXING
            object o = i;
            string.Concat(o, ” “, o, ” “, o);
#elif CONCAT_STRINGS
            string s = i.ToString();
            string.Concat(s, ” “, s, ” “, s);
#endif
        }
        sw.Stop();
        Console.Error.WriteLine(“{0}ms”, sw.ElapsedMilliseconds);
    }
}

I compiled the code with one symbol defined each time, with optimisations and without debug information, and ran it from a command line, writing to nul (i.e. no disk or actual console activity). Here are the results:

Symbol	Results (ms)	Average (ms)
CONSOLE_WITH_BOXING	33054	33444
	33898
	33381
CONSOLE_NO_BOXING	34638	33451
	32423
	33294
CONSOLE_STRINGS	29259	28337
	29071
	26683
FORMAT_WITH_BOXING	17143	17210
	18100
	16389
FORMAT_NO_BOXING	15814	15657
	15936
	15222
FORMAT_STRINGS	9178	8999
	9077
	8742
CONCAT_WITH_BOXING	12056	12563
	14304
	11329
CONCAT_NO_BOXING	11949	12240
	13145
	11628
CONCAT_STRINGS	5833	5936
	6263
	5713

So, what do we learn from this? Well, a number of things:

As ever, microbenchmarks like this are pretty variable. I tried to do this on a “quiet” machine, but as you can see the results varied quite a lot. (Over two seconds between best and worst for a particular configuration at times!)
The difference due to boxing with the original code in the book is basically inside the “noise”
The dominant factor of the statement is writing to the console, even when it’s not actually writing to anything real
The next most important factor is whether we convert to string once or three times
The next most important factor is whether we use String.Format or Concat
The least important factor is boxing

Now I don’t want anyone to misunderstand me – I agree that boxing is less efficient than not boxing, where there’s a choice. Sometimes (as here, in my view) the “more efficient” code is slightly less readable – and the efficiency benefit is often negligible compared with other factors. Exactly the same thing happened in Accelerated C# 2008, where a call to Math.Pow(x, 2) was the dominant factor in a program again designed to show the efficiency of avoiding boxing.

The performance scare of boxing is akin to that of exceptions, although I suppose it’s more likely that boxing could cause a real performance concern in an otherwise-well-designed program. It used to be a much more common issue, of course, before generics gave us collections which don’t require boxing/unboxing to add/fetch data.

In short: yes, boxing has a cost. But please look at it in context, and if you’re going to start making claims about how much faster code will run when it avoids boxing, at least provide an example where it actually contributes significantly to the overall execution cost.

Book reviews, Books, C#

Book Review: Programming C# 3.0 by Jesse Liberty and Donald Xie

September 27, 2008 jonskeet 14 Comments

Resources

The O’Reilly page (errata etc)
Jesse Liberty’s page for his various books
Buy it from Amazon or Barnes and Noble

Disclaimer

One reader commented that a previous book review was too full of “this is only my personal opinion” and other such disclaimers. I think it’s still important to declare the situation, but I can see how it can get annoying if done throughout the review. So instead, I’ve lumped everything together here. Please bear these points in mind while reading the whole review:

Obviously this book competes with C# in Depth, although probably not very much.
I was somewhat prejudiced against the book by seeing that the sole 5-star review for it on Amazon was by Jesse Liberty himself. Yes, he wanted to explain why he wrote the book and why he’s proud of it, but giving yourself a review isn’t the right way to go about it.
I’ve seen a previous edition of the book (for C# 2.0) and been unimpressed at the coverage of some of the new features.
I’m a nut for technical accuracy, particularly when it comes to terminology. More on this later, but if you don’t mind reading (and then presumably using) incorrect terminology, you’re likely to have a lot better time with this book than I did.
I suspect I have higher expectations for established, prolific authors such as Jesse Liberty than for newcomers to the world of writing.
I’m really not the target market for this book.

Okay, with all that out of the way, let’s get cracking.

Contents and target audience

According to the preface, Programming C# 3.0 (PC# from now on) is for people learning C# for the first time, or brushing up on it. There’s an expectation that you probably already know another language – it wouldn’t be impossible to learn C# from the book without any prior development experience, but the preface explicitly acknowledges that it would be reasonably tough. That’s a fair comment – probably fair for any book, in fact. I have yet to read anything which made me think it would be a wonderful way to teach someone to program from absolute scratch. Likewise the preface recommends C# 3.0 in a Nutshell for a more detailed look at the language, for more expert readers. Again, that’s reasonable – it’s clearly not aiming to go into the same level of depth as Accelerated C# 2008 or C# in Depth.

The book is split into 4 parts:

The C# language: pretty much what you’d expect, except that not all of the language coverage is in this part (most of the new features of C# 3.0 are in the second part) and some non-language coverage is included (regular expressions and collections) – about 270 pages
C# and Data: LINQ, XML (the DOM API and a bit of LINQ to XML), database access (ADO.NET and LINQ to SQL) – about 100 pages
Programming with C#: introductions to ASP.NET, WPF and Windows Forms – about 85 pages
The CLR and the .NET Framework: attributes, reflection, threading, I/O and interop – about 110 pages

As you can tell, the bulk of it is in the language part, which is fine by me and reflects the title accurately. I’ll focus on that part of the book in this review, and the first chapter of part 2, which deals with the LINQ parts of C# 3.0. To be honest, I don’t think the rest of the book actually adds much value, simply because they skim over the surface of their topics so lightly. Part 3 would make a reasonable series of videos – and indeed that’s how it’s written, basically in the style of “Open Visual Studio, start a new WinForms project, now drag a control over here” etc. I’ve never been fond of that style for a book, although it works well in screencasts.

The non-LINQ database and XML chapters in part 2 seemed relatively pointless too – I got the feeling that they’d been present in older editions and so had just stayed in by default. With the extra space available from cutting these, a much better job could have been done on LINQ to SQL and LINQ to XML. The latter gets particularly short-changed in PC#, with a mere 4 pages devoted to it! (C# in Depth is much less of a “libraries” book but I still found over 6 pages to devote to it. Not a lot, I’ll grant you.)

Part 4 has potential, and is more useful than the previous parts – reflection, threading, IO and interop are all important topics (although I’d probably drop interop in favour of internationalization or something similar) – but they’re just not handled terribly well. The threading chapter talks about using lock or Monitor, but never states that lock is just shorthand for try/finally blocks which use Monitor; no mention is made of the memory model or volatility; aborting threads is demonstrated but not warned about; the examples always lock on this without explaining that it’s generally thought to be a bad idea. The IO chapter uses TextReader (usually via StreamReader) but never mentions the crucial topic of character encodings (it uses Encoding.ASCII but without really explaining it) – and most damning of all, as far as I can tell there’s not a single using statement in the entire chapter. There are calls to Close() at the end of each example, and there’s a very brief mention saying that you should always explicitly close streams – but without saying that you should use a using statement or try/finally for this purpose.

Okay, enough on those non-language topics – let’s look at the bulk of the book, which is about the language.

Language coverage

PC# starts from scratch, so it’s got the whole language to cover in about 300 pages. It would be unreasonable to expect it to provide as much attention to detail as C# in Depth, which (for the most part) only looks at the new features of C# 2.0 and 3.0. (On the other hand, if the remaining 260 pages had been given to the language as well, a lot more ground could have been covered.) It’s also worth bearing in mind that the book is not aimed at confident/competent C# developers – it’s written for newcomers, and delving into tricky issues like generic variance would be plain mean. However, I’m still not impressed with what’s been left out:

There’s no mention of nullable types as far as I can tell – indeed, the list of operators omit the null-coalescing operator (??).
Generics are really only talked about in the context of collections – despite the fact that to understand any LINQ documentation, you really will need to understand generic delegates. Generic constraints are only likewise only mentioned in the context of collections, and only what I call a “derivation type constraint” (e.g. T : IComparable<T>) (as far as I can tell the spec doesn’t give this a name). There’s no coverage of default(T) – although the “default value of a type” is mentioned elsewhere, with an incorrect explanation.
Collection initializers aren’t explained as far as I can tell, although I seem to recall seeing one in an example. They’re not mentioned in the index.
Iterator blocks (and the yield contextual keyword) are likewise absent from the index, although there’s definitely one example of yield return when IEnumerable<T> is covered. The coverage given is minimal, with no mention of the completely different way that this executes compared with normal methods.
Query expression coverage is limited: although from, where, select, orderby, join and group are covered, there’s no mention of let, the difference between join and join ... into, explicitly typed range variables, or query continuations. The translation process isn’t really explained clearly, and the text pretty much states that it will always use extension methods.
Expression trees aren’t referenced to my knowledge; there’s one piece of text which attempts to mention them but just calls them “expressions” – which are of course entirely different. We’ll come onto terminology in a minute.
Only the simplest (and admittedly most common by a huge margin) form of using directives is shown – no extern aliases, no namespace aliases, not even using Foo = System.Console;
Partial methods aren’t mentioned.
Implicitly typed arrays aren’t covered.
Static classes may be mentioned in passing (not sure) but not really explained.
Object initializers are shown in one form only, ditto anonymous object initializer expressions
Only field-like events are shown. The authors spend several pages on an example of bad code which just has a public delegate variable, and then try to blame delegates for the problem (which is really having a public variable). The solution is (of course) to use an event, but there’s little to no explanation of the nature of events as pairs of methods, a bit like properties but with subscribe/unsubscribe behaviour instead of data fetch/mutate.
Anonymous methods and lambda expressions are covered, but with very little text about the closure aspect of them. This is about it: “[…] and the anonymous method has access to the variables in the scope in which they are defined:” (followed by an example which doesn’t demonstrate the use of such variables at all).

I suspect there’s more, but you get the general gist. I’m not saying that all of these should have been covered and in great detail, but really – no mention of nullable types at all? Is it really more appropriate in a supposed language book to spend several pages building an asynchronous file server than to actually list all the operators accurately?

Okay, I’m clearly beginning to rant by now. The limited coverage is annoying, but it’s not that bad. Yes, I think the poor/missing coverage of generics and nullable types is a real problem, but it’s not enough to get me really cross. It’s the massive abuse of terminology which winds me up.

Accuracy

I’ll say this for PC# – if you ignore the terminology abuse, it’s mostly accurate. There are definitely “saying something incorrect” issues (e.g. an implication that ref/out can only be used with value type parameters; the statement that reference types in an array aren’t initialized to their default value (they are – the default value is null); the claim that extension methods can only access public members of target types (they have the same access as normal – so if the extension method is in the same assembly as the target type, for instance, it could access internal members)) but the biggest problem is that of terminology – along with sloppy code, including its formatting.

The authors confuse objects, values, variables, expressions, parameters, arguments and all kinds of other things. These have well-defined meanings, and they’re there for a reason. They do have footnotes explaining that they’re deliberately using the wrong terminology – but that doesn’t make it any better. Here are the three footnotes, and my responses to them:

The terms argument and parameter are often used interchangably, though some programmers insist on differentiating between the parameter declaration and the arguments passed in when the method is invoked.

Just because others abuse terms doesn’t mean it’s right for a book to do so. It’s not that programmers insist on differentiating between the two – the specification does. Now, to lighten things up a bit I’ll acknowledge that this one isn’t always easy to deal with. There are plenty of times where I’ve tried really hard to use the right term and just not ended up with a satisfactory bit of wording. However, at least I’ve tried – and where it’s easy, I’ve done the right thing. I wish the authors had the same attitude. (They do the same with the conditional operator, calling it “the ternary operator”. It’s a ternary operator. Having three operands is part of its nature – it’s not a description of its behaviour. Again, lots of other people get this wrong. Perhaps if all books got it right, more developers would too.) Next up:

Throughout this book, I use the term object to refer to reference and value types. There is some debate in the fact that Microsoft has implemented the value types as though they inherited from the root class Object (and thus, you may call all of Object’s methods on any value type, including the built-in types such as int.)

To me, this pretty much reads as “I’m being sloppy, but I’ve got half an excuse.” It’s true that the C# specification isn’t clear on this point – although the CLI spec is crystal clear. Personally, it just feels wrong to talk about the value 5 as an object. It’s an object when it’s boxed, of course (and if you call any Object methods on a value type which haven’t been overridden by that type, it gets boxed at that point) but otherwise I really don’t think of it as an object. An instance of the type, yes – but not an object. So yes, I’ll acknowledge that there’s a little wiggle room here – but I believe it’s going to confuse readers more than it helps them.

It’s the “confusing readers more than it helps them” part which is important. I’m not above a little bit of shortcutting myself – in C# in Depth, I refer to automatically implemented properties as “automatic properties” (after explicitly saying what I’m doing) and I refer to the versions of C# as 1, 2 and 3 instead of 1.0, 1.2, 2.0 and 3.0. In both these cases, I believe it adds to the readability of the book without giving any room for confusion. That’s very different from what’s going on in PC#, in my view. I’ve saved the most galling example of this for last:

As noted earlier, btnUpdate and btnDelete are actually variables that refer to the unnamed instances on the heap. For simplicity, we’ll refer to these as the names of the objects, keeping in mind that this is just short-hand for “the name of the variables that refer to the unnamed instances on the heap.”

This one’s the killer. It sounds relatively innocuous until you see the results. Things like this (from P63):

ListBox myListBox; // instantiate a ListBox object

No, that code doesn’t instantiate anything. It declares a variable – and that’s all. The comment isn’t non-sensical – the idea of some code which does instantiate a ListBox object clearly makes sense – but it’s not what’s happening in this code (in C# – it would in C++, which makes it even more confusing). That’s just one example – the same awful sloppiness (which implies something completely incorrect) permeates the whole book. Time and time again we’re told about instances being created when they’re not. From P261:

The Clock class must then create an instance of this delegate, which it does on the following line:

public SecondChangeHandler SecondChanged;

Why do I care about this so much? Because I see the results of it on the newsgroups, constantly. How can I blame developers for failing to communicate properly about the problems they’re having if their source of learning is so sloppy and inaccurate? How can they get an accurate mental model of the language if they’re being told that objects are being instantiated when they’re not? Communication and a clear mental model are very important to me. They’re why I get riled up when people perpetuate myths about where structs “live” or how parameters are passed. PC# had me clenching my fists on a regular basis.

These are examples where the authors apparently knew they were abusing the terminology. There are other examples where I believe it’s a genuine mistake – calling anonymous methods “anonymous delegates” or “statements that evaluate to a value are called expressions” (statements are made up of expressions, and expressions don’t have to return a value). I can certainly sympathise with this. Quite where they got the idea that HTML was derived from “Structured Query Markup Language” I don’t know – the word “Query”should have been a red flag – but these things happen.

In other places the authors are just being sloppy without either declaring that they’re going to be, or just appearing to make typos. In particular, they’re bad at distinguishing between language, framework and runtime. For instance:

“C# combines the power and complexity of regular expression syntax […]” – no, C# itself neither knows nor cares about regular expressions. They’re in the framework.
(When talking about iterator blocks) “All the bookkeeping for keeping track of which element is next, resetting the iterator, and so forth is provided for you by the Framework.” – No, this time it is the C# compiler which is doing all the work. (It doesn’t support reset though.)
“Strings can also be created using verbatim string literals, which start the at (@) symbol. This tells the String constructor that the string should be used verbatim […]” – No, the String constructor doesn’t know about verbatim string literals. They’re handled by the C# compiler.
“The .NET CLR provides isolated storage to allow the application developer to store data on a per-user basis.” I very much doubt that the CLR code has any idea about this. I expect it to be in the framework libraries.

Again, if books don’t get this right, how do we expect developers to distinguish between the three? Admittedly sometimes it can be tricky to decide where responsibility lies – but there are plenty of clearcut cases where PC# is just wrong. I doubt that the authors really don’t know the difference – they just don’t seem to think it’s important to get it right.

Code

I’m mostly going to point out the shortcomings of the code, but on the plus side I believe almost all of it will basically work. There’s one point at which the authors have both a method and a variable with the same name (which is already in the unconfirmed errata) and a few other niggles, but they’re relatively rare. However:

The code frequently ignores naming conventions. Method and class names sometimes start with lower case, and there’s frequent use of horrible names beginning with “my” or “the”.
The authors often present several pages of code together, and then take them apart section by section. This isn’t the only book to do this by a long chalk, but I wonder – does anyone really benefit from having the whole thing in a big chunk? Isn’t it better to present small, self-contained examples?
As mentioned before, the uses of using statements are few and far between.
The whitespace is all over the place. The indentation level changes all the time, and sometimes there are outdents in the middle of blocks. Occasionally newlines have actually been missed out, and in other cases (particularly at the start of class bodies) there are two blank lines for no reason at all. (The latter is very odd in a book, where vertical whitespace is seen as extremely valuable.) Sometimes there’s excessive (to my mind) spacing Just as an example (which is explicitly labelled as non-compiling code, so I’m not faulting it at all for that):
using System.Console;
class Hello
{
    static void Main()
    {
    WriteLine(“Hello World”);
   }
}

I promise you that’s exactly how it appears in the book. Now this may have started out as a fault of the type-setter, but the authors should have picked it up before publication, IMO. I could understand there being a few issues like this (proof-reading code really is hard) but not nearly as many as there are.
There are examples of mutable structs (or rather, there’s at least one example), and no warning at all that mutable value types are a really, really bad idea.

Again, I don’t want to give the impression I’m an absolute perfectionist when it comes to code in book. For the sake of keeping things simple, sometimes authors don’t seal types where they should, or make them immutable etc. I’m not really looking for production-ready code, and indeed I made this very point in one of the notes for C# in Depth. However, I draw the line at using statements, which are important and easy to get right without distracting the reader. Likewise giving variables good names – counter rather than ctr, and avoiding those the and my prefixes – makes a competent reader more comfortable and can transfer good habits to the novice via osmosis.

Writing style and content ordering

Time for some good news – when you look beyond the terminology, this is a really easy book to read. I don’t mean that everything in it is simplistic, but the style rarely gets in the way. It’s not dry, and some of the real-world analogies are very good. This may well be Jesse Liberty’s experience as a long-standing author making itself apparent.

In common with many O’Reilly books, there are two icons which usually signify something worth paying special attention to: a set of paw prints indicating a hint or tip, and a mantrap indicating a commonly encountered issue to be aware of. Given the rest of the review, I suspect you’d be surprised if I agreed with all of the points made in these extra notes – and indeed there are some issues – but most of them are good.

Likewise there are also notes for the sake of existing Java and C++ developers, which make sense and are useful.

I don’t agree with some of the choices made in terms of how and when to present some concepts. I found the way of explaining query expressions confusing, as it interleaved “here’s a new part of query expressions” with “here’s a new feature (e.g. anonymous types, extension methods).” It will come as no surprise to anyone who’s read C# in Depth that I prefer the approach of presenting all the building blocks first, and then showing how query expressions use all those features. There’s a note explaining why the authors have done what they’ve done, but I don’t buy it. One important thing with the “building blocks first” approach is to present a preliminary example or two, to give an idea of where we’re headed. I’ve forgotten to do that in the past (in a talk) and regretted it – but I don’t regret the overall way of tackling the topic.

On a slightly different note, I would have presented some of the earlier topics in a different order too. For instance, I regard structs and interfaces as more commonly used and fundamental topics than operator overloading. (While C# developers tend not to create their own structs often, they use them all the time. When was the last time you wrote a program without an int in it?) This is a minor nit – and one which readers may remember I also mentioned for Accelerated C# 2008.

There’s one final point I’d like to make, but which doesn’t really fit anywhere else – it’s about Jesse Liberty’s dedication. Most people dedicate books to friends, colleages etc. Here’s Jesse’s:

This book is dedicated to those who come out, loud, and in your face and in the most inappropriate places. We will look back at this time and shake our heads in wonder. In 49 states, same-sex couples are denied the right to marry, though incarcerated felons are not. In 36 states, you can legally be denied housing just for being q-u-e-e-r. In more than half the states, there is no law protecting LGBT children from harassment in school, and the suicide rate among q-u-e-e-r teens is 400 percent higher than among straight kids. And, we are still kicking gay heroes out of the military despite the fact that the Israelis and our own NSA, CIA, and FBI are all successfully integrated. So yes, this dedication is to those of us who are out, full-time.

(I’ve had to spell out q-u-e-e-r as otherwise the blog software replaces it with asterisks. Grr.) I’m straight, but I support Jesse’s sentiment 100%. I can’t remember when I first started taking proper notice of the homophobia in the world, but it was probably at university. This dedication does nothing to help or hinder the reader with C#, but to my mind it still makes it a better book.

Conclusion

In short, I’m afraid I wouldn’t recommend Programming C# 3.0 to potential readers. There are much better books out there: ones which won’t make it harder for the reader to talk about their code with others, in particular. It’s not all bad by any means, but the mixture of sloppy use of terminology and poor printed code is enough of a problem to make me give a general thumbs down.

Next up will be CLR via C#, by Jeffrey Richter.

Response from Jesse Liberty

As normal, I mailed the author (in this case just Jesse Liberty – I confess I didn’t look for Donald Xie’s email address) and very promptly received a nice response. He asked me to add the following as his reaction:

I believe the book is very good for most real-world programmers and the publisher and I are dedicated to making the next revision a far better book, by correcting some of the problems you point out, and by beefing up the coverage of the newer features of the language.

Also as normal, I’ll be emailing Jesse with a list of the errors I found, so hopefully they can be corrected for the next edition.

Books, C#, LINQ

Logging enumeration flow

September 9, 2008 jonskeet 9 Comments

I’m currently reading Pro LINQ: Language Integrated Query in C# 2008 by Joe Rattz and yesterday I came across a claim about Enumerable.Intersect which didn’t quite ring true. I consulted MSDN and the documentation is exactly the same as the book. Here’s what it says:

When the object returned by this method is enumerated, Intersect enumerates first, collecting all distinct elements of that sequence. It then enumerates second, marking those elements that occur in both sequences. Finally, the marked elements are yielded in the order in which they were collected.

(first is the first parameter, the one which the method appears to be called on when using it as an extension method. second is the second parameter – the other sequence involved.)

This seems to be needlessly restrictive. In particular, it doesn’t allow you to work with an infinite sequence on either side. It also means loading the whole of both sequences into memory at the same time. Given the way that Join works, I was surprised to see this. So I thought I’d test it. This raised the question of how you trace the flow of a sequence – how do you know when data is being pulled from it? The obvious answer is to create a new sequence which fetches from the old one, logging as it goes. Fortunately this is really easy to implement:

using System;
using System.Collections.Generic;

public static class Extensions
{
    public static IEnumerable<T> WithLogging<T>(this IEnumerable<T> source,
                                                string name)
    {
        foreach (T element in source)
        {
            Console.WriteLine(“{0}: {1}”, name, element);
            yield return element;
        }
    }
}

We keep a name for the sequence so we can easily trace which sequence is being pulled from at what point. Now let’s apply this logging to a call to Intersect:

using System;
using System.Linq;
using System.Collections.Generic;

// Compile alongside the Extensions class

class Test
{
    static void Main()
    {
        var first = Enumerable.Range(1, 5).WithLogging(“first”);
        var second = Enumerable.Range(3, 5).WithLogging(“second”);
        foreach (int i in first.Intersect(second))
        {
            Console.WriteLine(“Intersect: {0}”, i);
        }
    }
}

As you can see, we’re intersecting the numbers 1-5 with the numbers 3-7 – the intersection should clearly be 3-5. We’ll see a line of output each time data is pulled from either first or second, and also when the result of Intersect yields a value. Given the documentation and the book, one would expect to see this output:

// Theoretical output. It doesn’t really do this
first: 1
first: 2
first: 3
first: 4
first: 5
second: 3
second: 4
second: 5
second: 6
second: 7
Intersect: 3
Intersect: 4
Intersect: 5

Fortunately, it actually works exactly how I’d expect: the second sequence is evaluated fully, then the first is evaluated in a streaming fashion, with results being yielded as they’re found. (This means that, if you’re sufficiently careful with the result, e.g. by calling Take with a suitably small value, the first sequence can be infinite.) Here’s the actual output demonstrating that:

// Actual output.
second: 3
second: 4
second: 5
second: 6
second: 7
first: 1
first: 2
first: 3
Intersect: 3
first: 4
Intersect: 4
first: 5
Intersect: 5

Initial Conclusion

There are two interesting points here, to my mind. The first is demonstrating that the documentation for Intersect is wrong – the real code is more sensible than the docs. That’s not as important as seeing how easy it is to log the flow of sequence data – as simple as adding a single extension method and calling it. (You could do it with a Select projection which writes the data and then yields the value of course, but I think this is neater.)

I’m hoping to finish reading Joe’s book this week, and write the review over the weekend, by the way.

Update (Sept. 11th 2008)

Frederik Siekmann replied to this post with a thrilling alternative implementation to stream the intersection, which takes alternating elements from the two sequences involved. It’s a bit more memory hungry (with three sets of elements to remember instead of just one) but it means that we can deal with two infinite streams, if we’re careful. Here’s a complete example:

using System;
using System.Collections.Generic;
using System.Linq;

public static class Extensions
{
    public static IEnumerable<T> AlternateIntersect<T>(this IEnumerable<T> first, IEnumerable<T> second)
    {
        var intersection = new HashSet<T>();
        var firstSet = new HashSet<T>();
        var secondSet = new HashSet<T>();
        using (IEnumerator<T> firstEnumerator = first.GetEnumerator())
        {
            using (IEnumerator<T> secondEnumerator = second.GetEnumerator())
            {
                bool firstHasValues = firstEnumerator.MoveNext();
                bool secondHasValues = secondEnumerator.MoveNext();
                while (firstHasValues && secondHasValues)
                {
                    T currentFirst = firstEnumerator.Current;
                    T currentSecond = secondEnumerator.Current;

                    if (!intersection.Contains(currentFirst) &&
                        secondSet.Contains(currentFirst))
                    {
                        intersection.Add(currentFirst);
                        yield return currentFirst;
                    }
                    firstSet.Add(currentFirst);
                    if (!intersection.Contains(currentSecond) &&
                        firstSet.Contains(currentSecond))
                    {
                        intersection.Add(currentSecond);
                        yield return currentSecond;
                    }
                    secondSet.Add(currentSecond);
                    firstHasValues = firstEnumerator.MoveNext();
                    secondHasValues = secondEnumerator.MoveNext();
                }
                if (firstHasValues)
                {
                    do
                    {
                        T currentFirst = firstEnumerator.Current;
                        if (!intersection.Contains(currentFirst) &&
                            secondSet.Contains(currentFirst))
                        {
                            intersection.Add(currentFirst);
                            yield return currentFirst;
                        }
                    } while (firstEnumerator.MoveNext());
                }
                if (secondHasValues)
                {
                    do
                    {
                        T currentSecond = secondEnumerator.Current;
                        if (!intersection.Contains(currentSecond) &&
                            firstSet.Contains(currentSecond))
                        {
                            intersection.Add(currentSecond);
                            yield return currentSecond;
                        }
                    } while (secondEnumerator.MoveNext());
                }
            }
        }
    }

    public static IEnumerable<T> WithLogging<T>(this IEnumerable<T> source, string name)
    {
        foreach (T element in source)
        {
            Console.WriteLine(string.Format(“{0}: {1}”, name, element));
            yield return element;
        }
    }
}

class Test
{
    static void Main()
    {
        var positiveIntegers = Enumerable.Range(0, int.MaxValue);

        var multiplesOfTwo = positiveIntegers.Where(x => (x%2) == 0)
                                             .WithLogging(“Twos”);
        var multiplesOfThree = positiveIntegers.Where(x => (x%3) == 0)
                                               .WithLogging(“Threes”);

        foreach (int x in multiplesOfTwo.AlternateIntersect(multiplesOfThree).Take(10))
        {
            Console.WriteLine (“AlternateIntersect: {0}”, x);
        }
    }
}

Most of the code is the alternating intersection – the test at the end just shows intersection of the sequence 2, 4, 6, 8… with 3, 6, 9, 12… The output shows elements being taken from both sequences, and yielded when a match is found. It’s important that we limit the output in some way – in the above code we call Take(10) but anything which prevents the loop from just executing until we run out of memory is fine.

Twos: 0
Threes: 0
AlternateIntersect: 0
Twos: 2
Threes: 3
Twos: 4
Threes: 6
Twos: 6
Threes: 9
AlternateIntersect: 6
Twos: 8
Threes: 12
Twos: 10
Threes: 15
Twos: 12
Threes: 18
AlternateIntersect: 12
Twos: 14
Threes: 21
Twos: 16
Threes: 24
Twos: 18
Threes: 27
AlternateIntersect: 18
(etc)

That’s really neat. Quite how often it’ll be useful is a different matter, but I find this kind of thing fascinating to consider. Thanks Frederik!

Books, C#, Java

Data Structures and Algorithms: new free eBook available (first draft)

August 29, 2008 jonskeet 1 Comment

I’ve been looking at this for a while: Data Structures and Algorithms: Annotated reference with examples. It’s only in “first draft” stage at the moment, but the authors would love your feedback (as would I). Somehow I’ve managed to end up as the editor and proof-reader, although due to my holiday the version currently available doesn’t have many of my edits in. It’s a non-academic data structures and algorithms book, intended (as I see it, anyway) as a good starting point for those who know that they ought to be more aware of the data structures they use every day (lists, heaps etc) but don’t have an academic background in computer science.

An implementation will be available in both Java and C#, I believe.

Book reviews, Books, C#

Book review: Accelerated C# 2008 by Trey Nash

August 1, 2008 jonskeet 20 Comments

Time for another book review, and this time it’s a due to a recommendation from a reader who has this one, C# in Depth and Head First C#.

Resources

Introduction and disclaimer

My normal book review disclaimer applies, but probably more so than ever before. Yes, Accelerated C# 2008 is a competitor to C# in Depth. They’re different in many ways, but many people would no doubt be in the target audience for both books. If you meet that criterion, please be aware that as the author of C# in Depth I can’t possibly be 100% objective when reviewing another C# book. That said, I’ll try to justify my opinions everywhere I can.

Target audience and content overview

Accelerated C# 2008 is designed to appeal to existing developers with experience in an OO language. As one of the Amazon reviews notes, you may struggle somewhat if you don’t have any .NET experience beforehand – while it should be possible to read it knowing only Java or C++, there are various times where a certain base level of knowledge is assumed and you’ll want to refer to MSDN for some background material. If you come at the book with no OO experience at all, I expect you’ll have a hard time. Chapter 4 does cover the basics of OO in .NET (classes, structs, methods, properties etc) this isn’t really a beginner’s book.

In terms of actual content covered, Accelerated C# 2008 falls somewhere between C# in Depth (almost purely language) and C# 3.0 in a Nutshell (language and then core libraries). It doesn’t attempt to cover all the core technologies (IO, reflection, security, interop etc are absent) but it goes into detail beyond the C# language when it comes to strings, exceptions, collections, threading and more. As well as purely factual information, there’s a lot of guidance as well, including a whole chapter entitled “In Search of C# Canonical Forms.”

General impressions

I’d like to make it clear to start with that I like the book. I have a number of criticisms, none of which I’m making up for the sake of being critical – but that in no way means it’s a bad book at all. It’s very unlikely that you know everything in here (I certainly didn’t) and the majority of the guidance is sound. The code examples are almost always self-contained (a big plus in my view) and Trey’s style is very readable. Where there are inaccuracies, they’re usually pretty harmless, and the large amount of accurate and insightful material makes up for them.

Just as I often compare Java to C# in my book, so Trey often compares C++ to C# in his. While my balance of C# to C++ knowledge is such that these comments aren’t particularly useful to me, I can see them being good for a newcomer to C# from a C++ background. I thought there might have been a few too many comparisons (I understood the point about STL and lambdas/LINQ the first time round…) but that’s just a minor niggle.

Where C# in Depth is primarily a “read from start to finish” book and C# 3.0 in a Nutshell is primarily a reference book (both can be used the other way, of course) Accelerated C# 2008 falls between the two. It actually achieves the best of both worlds to a large extent, which is an impressive feat. The ordering could be improved (more on this later on) but the general feeling is very good.

One quick word about the size of the book in terms of content: if you’re one of those people who judges the amount of useful content in a book on its page count, it’s worth noting that the font in this book is pretty small. I would guess that it packs about 25% more text per page than C# in Depth does, taking its “effective” page count from around 500 to 625. Also, the content is certainly meaty – you’re unlikely to find yourself skimming over loads of simple stuff trying to get to the good bits. Speaking of “getting to the good bits” let’s tackle my first significant gripe.

Material organisation

If you look at the tables of contents for Accelerated C# 2008 and Accelerated C# 2005, you’ll notice that the exact same chapter titles in the 2005 edition carry over in the same order in the 2008 edition. There are three extra chapters in the new edition, covering extension methods, lambda expressions and LINQ. That’s not to say that the content of the “duplicate” chapters is the same as before – C# 3.0 features are introduced in the appropriate place within existing chapters. In terms of ordering the chapters, I think it would be have been much more appropriate to keep the last chapter of the old edition – “In Search of C# Canonical Forms” – as the last chapter of the new edition. Apart from anything else, that would allow it to include hints and tips involving the new C# 3 features which are currently covered later. It really feels like a “wrapping up” chapter, and deserves to be last.

That’s not the only time that the ordering felt strange, however. Advanced topics (at least ones which feel advanced to me) are mixed in with fairly basic ones. For instance, in the chapter on exceptions, there’s a section about “exception neutrality” which includes details about constrained execution regions and critical finalizers. All interesting stuff – even though I wish there were more of a prominent warning saying, “This is costly to both performance and readability: only go to these lengths when you really, really need to.” However, this comes before a section about using try/finally blocks and the using statement to make sure that resources are cleaned up however a block is exited. I can’t imagine anyone who knows enough C# to take in the exception neutrality material also not knowing about try/finally or the using statement (or how to create your own custom exception class, which comes between these two topics).

Likewise the chapter which deals with collections, including generic ones, comes before the chapter on generics. If I were a reader who didn’t know generics already, I think I’d get very confused reading about ICollection<T> without knowing what the T meant. Now don’t get me wrong: ordering material so that you don’t get “circular references” is often hard if not impossible. I just think it could have been done better here.

Aiming too deep?

It’s not like me to criticise a book for being too deep, but I’m going to make an exception here. Every so often, I came away from a topic thinking that it would have been better covered a little bit more lightly. Sometimes this was because a running example became laborious and moved a long way from anything you were actually likely to want to do in real life. The sections on “borrowing from functional programming” and memoization/currying/anonymous recursion felt guilty of this. It’s not that they’re not interesting topics, but the examples picked didn’t quite work for me.

The other problem with going deep is that you really, really need to get things right – because your readers are less likely to spot the mistakes. I’ll give three examples here:

Trey works hard on a number of occasions to avoid boxing, and points it out each time. Without any experience in performance tuning, you’d be forgiven for thinking that boxing is the primary cause of poor performance in .NET applications based on this book. While I agree that it’s something to be avoided where it’s possible to do so without bending the design out of shape, it doesn’t deserve to be laboured as much as it is here. In particular, Trey gives an example of a complex number struct and how he’s written appropriate overloads etc to avoid boxing. Unfortunately, to calculate the magnitude of the complex number (used to implement IComparable in a manner which violates the contract, but that’s another matter) he uses Math.Pow(real, 2) + Math.Pow(img, 2). Using a quick and dirty benchmark, I found that using real * real + img * img instead of Math.Pow made far, far more difference than whether or not the struct was boxed. (I happen to think it’s more readable code too, but never mind.) There was nothing wrong with avoiding the boxing, but in chasing the small performance gains, the big ones were missed.
In the chapter on threading, there are some demonstrations of lock-free programming (before describing locking, somewhat oddly – and without describing the volatile modifier). Now, personally I’d want to discourage people from attempting lock-free programming at all unless they’ve got a really good reason (with evidence!) to support that decision – but if you’re going to do it at all, you need to be hugely careful. One of the examples basically has a load of threads starting and stopping, updating a counter (correctly) using Interlocked.Increment/Decrement. Another thread monitors the count and periodically reports it – but unfortunately it uses this statement to do it:

threadCount = Interlocked.Exchange(ref numberThreads, numberThreads);

The explanation states: “Since the Interlocked class doesn’t provide a method to simply read an Int32 value in an atomic operation, all I’m doing is swapping the numberThreads variable’s value with its own value, and, as a side effect, the Interlocked.Exchange method returns to me the value that was in the slot.” Well, not quite. It’s actually swapping the numberThreads variable’s value with a value evaluated at some point before the method call. If you rewrite the code like this, it becomes more obviously wrong:

int tmp = numberThreads;
Thread.Sleep(1000); // What could possibly happen during this time, I wonder?
threadCount = Interlocked.Exchange(ref numberThreads, tmp);

The call to Thread.Sleep is there to make it clear that numberThreads can very easily change between the initial read and the call to Interlocked.Exchange. The correct fix to the code is to use something like this:

threadCount = Interlocked.CompareExchange(ref numberThreads, 0, 0);

That sets numberThreads atomically to the value 0 if (and only if) its value is already 0 – in other words, it will never actually change the value, just report it. Now, I’ve laboured the explanation of why the code is wrong because it’s fairly subtle. Obvious errors in books are relatively harmless – subtle ones are much more worrying.
As a final example for this section, let’s look at iterator blocks. Did you know that any parameters passed to methods implemented using iterator blocks become public fields in the generated class? I certainly didn’t. Trey pointed out that this meant they could easily be changed with reflection, and that could be dangerous. (After looking with reflector, it appears that local variables within the iterator block are also turned into public fields.) Now, leaving aside the fact that this is hugely unlikely to actually bite anyone (I’d be frankly amazed to see it as a problem in the wild) the suggested fix is very odd.

The example Trey gives is where originally a Boolean parameter is passed into the method, and used in two places. Oh no! The value of the field can be changed between those two uses, which could lead to problems! True. The supposed fix is to wrap the Boolean value in an immutable struct ImmutableBool, and pass that in instead. Now, why would that be any better? Certainly you can’t change the value within the struct – but you can easily change the field‘s value to be a completely different instance of ImmutableBool. Indeed, the breakage would involve exactly the same code, just changing the type of the value. The other train of thought which suggests that this approach would fail is that bool is already immutable, so it can’t be the mutability of the type of the field that causes problems. I’m sure there are much more useful things that Trey could have said in the two and a half pages he spent describing a broken fix to an unimportant problem.

Sorry, that was getting ranty for a bit… but I hope you understand why. Before concluding this review, let’s look at one chapter which is somewhat different to the rest, and which I’ve mentioned before:

In Search of C# Canonical Forms (aka “Design and Implementation Guidelines” :)

I’d been looking forward to this part of the book. I’m always interested in seeing what other people think the most important aspects of class design are. The book doesn’t go into much detail about abstract orientation (in this chapter, anyway – there’s plenty scattered through the book) but concentrates on core interfaces you might implement, etc. That’s fine. I’m still waiting for a C# book to be written to truly be on a par with Effective Java (I have the second edition waiting to be read at work…) but I wasn’t expecting it all to be here. So, was this chapter worth the wait?

Somewhat. I was very glad to see that the first point around reference types was “Default to sealed classes” – I couldn’t agree more, and the arguments were well articulated. Many other guidelines were either entirely reasonable or at least I could go either way on. There were a few where I either disagreed or at least would have put things differently:

Implementing cloning with copy constructors: one point about cloning which wasn’t mentioned is that (to quote MSDN) “The resulting clone must be of the same type as or a compatible type to the original instance.” The suggested implementation of Clone in the book is to use copy constructors. This means that every subclass must override Clone to call its own copy constructor, otherwise the instance returned will be of the wrong type. MemberwiseClone always creates an instance of the same type. Yes, it means the constructor isn’t called – but frankly the example given (performing a database lookup in the constructor) is a pretty dodgy cloning scenario in the first place, in my view. If I create a clone and it doesn’t contain the same data as the original, there’s something wrong. Having said that, the caveats Trey gives around MemberwiseClone are all valid in and of themselves – we just disagree about their importance. The advice to not actually implement ICloneable in the first place is also present (and well explained).
Implementing IDisposable: Okay, so this is a tough topic, but I was slightly disappointed to see the recommendation that “it’s wise for any objects that implement the IDisposable interface to also implement a finalizer […]” Now admittedly on the same page there’s the statement that “In reality, it’s rare that you’ll ever need to write a finalizer” but the contradiction isn’t adequately resolved. A lot of people have trouble understanding this topic, so it would have been nice to see really crisp advice here. My 20 second version of it is: “Only implement a finalizer if you’re holding on to resources which won’t be cleaned up by their own finalizers.” That actually cuts out almost everything, unless you’ve got an IntPtr to a native handle (in which case, use SafeHandle instead).
- As a side note, Trey repeatedly claims that “finalizers aren’t destructors” which irks me somewhat as the C# spec (the MS version, anyway) uses the word “destructor” exclusively – a destructor is the way you implement a .NET finalizer in C#. It would be fine to say “destructors in C# aren’t deterministic, unlike destructors in C++” but I think it’s worth acknowledging that the word has a valid meaning in the context of C#. Anyway…
Implementing equality comparisons: while this was largely okay, I was disappointed to see that there wasn’t much discussion of inheritance and how it breaks equality comparisons in a hard-to-fix way. There’s some mention of inheritance, but it doesn’t tackle the issue I think is thorniest: If I’m asking one square whether it’s equal to another square, is it enough to just check for everything I know about squares (e.g. size and position)? What about if one of the squares is actually a coloured square – it has more information than a “basic” square. It’s very easy to end up with implementations which break reflexivity, simply because the question isn’t well-defined. You effectively need to be asking “are these two objects equal in <this> particular aspect” – but you don’t get to specify the aspect. This is an example where I remember Effective Java (first edition) giving a really thorough explanation of the pitfalls and potential implementations. The coverage in Accelerated C# 2008 is far from bad – it just doesn’t meet the gold standard. Arguably it’s unfair to ask another book to compete at that level, when it’s trying to do so much else as well.
Ordering: I mentioned earlier on that the complex number class used for a boxing example failed to implement comparisons appropriately. Unfortunately it’s used as the example specifically for “how to implement IComparable and IComparable<T>” as well. To avoid going into too much detail, if you have two instances x and y such that x != y but x.Magnitude == y.Magnitude, you’ll find x.CompareTo(y) == y.CompareTo(x) (but with a non-zero result in both cases). What’s needed here is a completely different example – one with a more obvious ordering.
Value types and immutability: Okay, so the last bullet on the value types checklist is “Should this struct be immutable? […] Values are excellent candidates to be immutable types” but this comes after “Need to boxed instances of value? Implement an interface to do so […]” No! Just say no to mutable value types to start with! Mutable value types are bad, bad, bad, and should be avoided like the plague. There are a very few situations where it may be appropriate, but to my mind any advice checklist for implementing structs should make two basic points:
- Are you sure you really wanted a struct in the first place? (They’re rarely the right choice.)
- Please make it immutable! Pretty please with a cherry on top? Every time a struct is mutated, a cute kitten dies. Do you really want to be responsible for that?

Conclusion

At the risk – nay, certainty – of repeating myself, I’m going to say that I like the book despite the (sometimes subjective) flaws pointed out above. As Shakespeare wrote in Julius Caesar, “The evil men do lives after them. The good is oft interred with their bones.” So it is with book reviews – it’s a lot easier to give specific examples of problems than it is to report successes – but the book does succeed, for the most part. Perhaps the root of almost all my reservations is that it tries to do too much – I’m not sure whether it’s possible to go into that much detail and cater for those with little or no previous C# experience (even with Java/C++) and keep to a relatively slim volume. It was a very lofty goal, and Trey has done very well to accomplish what he has. I would be interested to read a book by him (and hey, potentially even collaborate on it) which is solely on well-designed classes and libraries.

In short, I recommend Accelerated C# 2008, with a few reservations. Hopefully you can judge for yourself whether my reservations would bother you or not. I think overall I slightly prefer C# 3.0 in a Nutshell, but the two books are fairly different.

Reaction

I sent this to Trey before publishing it, as is my custom. He responded to all my points extremely graciously. I’m not sure yet whether I can post the responses themselves – stay tuned for the possibility, at least. My one problem with reviewing books is that I end up in contact with so many other authors who I’d like to work with some day, and that number has just increased again…

Books

Calling Trey Nash…

July 23, 2008 jonskeet Leave a comment

Okay, this is a slightly odd post, but with any luck it’ll prove successful. As I’ve mentioned before, I’m currently reading through Trey Nash’s “Accelerated C# 2008”. I’m writing down errata as I go, but so far the only place I’ve found to leave them is on the Apress web site – which may well be a black hole as far as I know.

So, if anyone knows how to get in touch with Mr Nash, could they either let me know or give him my email address (skeet@pobox.com) and ask him to mail me? The normal means of finding him (blogs, forums, personal web sites etc) are failing me at the moment… I’ve left a comment on his blog, but as it’s rarely updated I don’t know whether he’s actively watching it.

Update: Trey has now mailed me, so all is well.

Book reviews, Books, C#

Judging a book by its cover (or title)

July 18, 2008 jonskeet 17 Comments

I’ve ranted about versioning before (and indeed in C# in Depth). I still believe that Microsoft didn’t do the world any favours when they introduced a relatively minor set of changes (just libraries, albeit important ones) with .NET 3.0 and a more major set of changes (languages, LINQ, core library improvements) with .NET 3.5. Using 2.5 and 3.0 would have made more sense, IMO. But never mind.

The fact is, people are confused about what version number applies to what. A number of people claim to be using C# 3.5 when they mean either C# 3.0 or .NET 3.5. (For a quick reference of what’s actually what, see my article on the issue.)

Okay, so far it’s the fault of Microsoft for being confusing and the fault of developers for not keeping up. Both of these are far more forgiveable in my view than being flat out wrong as many books are at the moment. I don’t believe this is any indication on the quality of the book itself (Accelerated C# 2008 is pretty good so far, for example) but I still think it’s pretty awful to make a title so confusing. So, here are some bad titles and others which use version numbers appropriately. (I’ve left out titles like Head First C# and C# in Depth which don’t specify version numbers.)

Bad

Professional C# 2008 (Wrox)
Pro C# 2008 and the .NET 3.5 Platform (Apress) (only partly incorrect, of course)
Murach’s C# 2008 (Mike Murach and Associates)
Accelerated C# 2008 (Apress)
Illustrated C# 2008 (Apress)
Pro LINQ: Language Integrated Query in C# 2008 (APress)
Pro ASP.NET 3.5 in C# 2008 (Apress) (again, only partially incorrect)
Beginning C# 2008 (Apress)
Beginning C# 2008 Databases (Apress)

Good

C# 3.0 in a Nutshell (O’Reilly)
Programming C# 3.0 (O’Reilly)
C# 3.0 Cookbook (O’Reilly)
C# 3.0 Design Patterns (O’Reilly)
C# 3.0 Pocket Reference (O’Reilly)
Pro C# with .NET 3.0 (Apress)
Beginning C# 3.0 (Wrox)
C# 3.0 Unleashed: With the .NET Framework 3.5 (Sams)

Okay

This is the “well, just about” list – because they’re referring to “Microsoft Visual C# 2008” instead of C# 2008, it’s referring to the IDE instead of the language. I think it’s better to name a book after the language instead of the tool you use to write in the language, personally…

Beginning Microsoft Visual C# 2008 (Wrox)
Microsoft Visual C# 2008 Step By Step (Microsoft)
Programming Microsoft Visual C# 2008 (Microsoft)
Microsoft Visual C# 2008 Express Edition: Build a Program Now! (Microsoft)

Notice a pattern? If anyone at Apress is reading this (unlikely, I know) – there’s no such thing as “C# 2008“.

Rant over for the moment. With any luck I might be able to finish reading Accelerated C# 2008 fairly soon, and give a proper book review.

Book reviews, Books, General

The trouble with book reviews

July 8, 2008 jonskeet 8 Comments

I’m currently reading two .NET books: Accelerated C# 2008 (Trey Nash) and Concurrent Programming on Windows (Joe Duffy). I will, in due course, post reviews here. However, the very act of thinking about the reviews has made me consider the inevitable inadequacies.

There tend to be two kinds of people reviewing technical books: those who’ve bought the book as a "regular punter" – who are aiming to learn something new. Then there are those who already know about the subject matter, but are reading the book mostly to review it. I realise there are people in-between (for whom the problems below aren’t such an issue) but these are the two camps this post addresses.

The purpose of a technical book is usually to impart information and wisdom. I would have left it at just information, but things like best practices don’t really count as "facts" as such – they are opinions and should be treated as such. So, there are two qualities that I look for in a book:

Is it accurate? Are the facts correct, and is the wisdom genuinely wise?
Is it a good teaching tool? How well is the information/wisdom transferred from author to reader?

I think it’s worth breaking these up, although there is significant overlap.

Accuracy

As I’ve noted before, I’m a stickler for accuracy. If I can spot a significant number of inaccuracies in the text, I find it hard to trust the rest. Now I generally don’t include grammatical errors, printing mistakes etc in this. While they make the book harder to read, they don’t typically leave me with a mistaken impression about the technology I’m trying to learn. There’s a blurring of the medium and the message when a book may be technically just about accurate, but still leaves an incorrect impression.

Now, the reader who has bought a book primarily to learn something new has little hope of judging accuracy. They can spot typos, and if the book is inconsistent or simply implausible that can become obvious – but subtle errors are likely to elude them. Just because an error is subtle doesn’t mean it’s unimportant, however. I know a reasonable amount about threading in .NET, but there’s a lot of Joe Duffy’s book which is new to me. He could have made dozens of mistakes in the text around Win32 threading, and I’d never know until it bit me. (For what it’s worth, I very much doubt that there are dozens of mistakes in the book.)

A reader who already knows the subject matter thoroughly is much more likely to spot the mistakes. However, they’re unlikely to be much good at judging the other major criterion…

Teaching efficacy

I can’t really remember much about how I learned to program, other than that it was over the course of several years. I started off with Basic on the ZX Spectrum, then moved on to C, then Java, then C#. Each experience built on the previous one. The way in which I learned C# wouldn’t have suited a non-Java programmer nearly as well as it suited me.

How can I possibly judge how well a book will teach a subject I already know? I can make some educated guesses about particularly confusing passages, and potentially critique the ordering of material (and indeed its inclusion/exclusion) but fundamentally it’s impossible to gauge it properly.

The people who don’t know the topic beforehand are likely to have a better idea, but it will still be flawed. In particular, you won’t know how well the material has sunk in until you’ve given yourself enough time to forget it. You won’t know how suitable the advice (wisdom) was until you’ve tried to follow it. You won’t know how complete the coverage is until you’ve used the technology in anger, preferably over the course of several projects. Even then it’s easy to miss stuff: if no-one on your team knew about iterator blocks and the C# book you were reading didn’t mention them, how would you know what you were missing?

Who should you trust?

This post has had a pretty depressing mood so far. That reflects my irritation with the whole topic – which isn’t to say I don’t enjoy reviewing books. I just have doubts as to their use. I do, however, have a few positive notes to end on, as well as some fairly self-evident advice:

If everyone likes a book, it’s probably good. Likewise unanimous disapproval is rarely a good sign.
When judging reviews, see if you can work out the context. Is the reviewer reading from a perspective of knowledge, or learning? If they’re criticising accuracy, they probably know what they’re talking about – but may not be a good judge of the style and teaching technique. If the review is mostly saying, "I learned C# from scratch in 20 minutes with the help of this fabulous book!" then you can guess that they at least believe they had a positive learning experience, but you should treat anything they say about accuracy and completeness with care.
Blogs tend to have more "expert" reviewers than ecommerce sites – although often bloggers will be encouraged to post reviews to Amazon as well.
Look for reviews which give specific praise/criticism. In particular if they give examples of teaching techniques, you will have more of an idea as to whether it’ll suit you. Reviews which basically say "I loved it!" or "That’s rubbish!" aren’t terribly informative.

On that note, I should probably stop. That’s another train journey gone that I should probably have spent actually reading… ah well. Please comment if you have other suggestions with regards to reviewing – particularly if it could help me to review books in a more useful way in the future.

Books

Manning book draw

June 21, 2008 jonskeet Leave a comment

I was recently contacted about a cool way of possibly getting a free .NET ebook (legally!) or if you’re really lucky, the complete Manning .NET library. Basically click on the advert below to enter: one person will be drawn each day until July 17th, and the final prize will be the complete .NET library from Manning. Have at it. If it happens to draw more attention to C# in Depth at the same time, that’s fine by me…

Book reviews, Books, C#

Guest post: Joe Albahari reviews C# in Depth

June 6, 2008 jonskeet Leave a comment

Joe Albahari, co-author of the excellent C# 3.0 in a
Nutshell (previously reviewed here) kindly agreed to review C# in Depth. Not only has he provided the review below,
but he also supplied several pages of notes made while he was reading it. Many of those
notes have been incorporated into the C# in Depth notes page – it’s always good to include thoughtful feedback. (And I always welcome more, hint hint.)

Without further ado, here’s Joe’s review.

C# in Depth: Review

After having been invited to
review this book by two people at Manning—as well as Jon himself—I
figure it’s about time I came forward! Bear in mind that I’m not
a typical reader: I’m an author, and this makes me more critical than
most. This is especially true given that I wrote C# 3.0 in a Nutshell
with a coauthor (imagine two people constantly searching for ways to
improve each others’ work!). So I will do my best to compensate and
strive to be fair. Please post a comment if you feel I’ve missed the
mark!

Scope

While most other C# books cover
the language, the CLR and at least some aspects of the Framework, C#
in Depth concentrates largely on just the language. You won’t find discussions
on memory management, assemblies, streams and I/O, security, threading,
or any of the big APIs like WPF or ASP.NET. This is good in that doesn’t
duplicate the books already out there, as well as giving more space
for the language.

You might expect that a book
focusing on the C# language itself would cover all of it. But interestingly,
the book covers only about a quarter of the C# language, namely the
features new to C# 2 and C# 3. This sets its target audience: programmers
who already know C# 1, but have not yet switched to C# 2 and 3. This
creates a tight focus, allowing it to devote serious space to topics
such as generics, nullable types, iterators and lambda expressions.
It’s no exaggeration to say that this book covers less than one tenth
of the ground of most other C# books, but gives that ground ten times
as much attention.

Organization and Style

The book is divided into three
parts:

Preliminaries (delegates
and the type system)
Features new to
C# 2.0
Features new to
C# 3.0

I like this organization: it
presents topics in an order roughly similar to how I teach when giving
tutorials on LINQ—starting with the foundations of delegates and generics,
before moving on to iterators and higher-order functions, and then finally
LINQ. Sometimes the routes are a little circuitous and involve some
huffing and puffing, but the journey is worthwhile and helps to solidify
concepts.

C# in Depth is a tutorial that
gradually builds one concept upon another and is designed primarily
for sequential reading. The examples don’t drag on over multiple sections,
however, so you can jump in at any point (assuming you understand the
preceding topics). The examples are all fairly short, too, which is
very much in my taste. In fact, I would say Jon and I think very much
alike: when he expresses an opinion, I nearly always agree wholeheartedly.

A big trap in writing tutorials,
is assuming knowledge of topics that you teach later. This book rarely
falls victim to this. The writer is also consistent in his use of terminology—and
sticks with the C# Language Specification which I think sets a good
example to all authors. Jon is not sloppy with concepts and is careful
in his wording to avoid misinterpretation. One thing that comes through
is that Jon really understands the material deeply himself.

If I were to classify this
book as beginner/intermediate/advanced, I’d say intermediate-to-advanced.
It’s quite a bit more advanced than, say, Jesse’s tutorial “Programming
C#”.

The layout of the book is pleasing—I
particularly like the annotations alongside the code listings.

Content

In the first section, “Preparing
for the Journey,” the book does cover a few C# 1 topics, namely delegates
and C#’s type system. Jon’s handling of these topics is excellent: his
discussion of static, explicit and safe typing is clear and helpful,
as is the section on value types versus reference types. I particularly
liked the section “Dispelling Myths”—this is likely to be
of use even to experienced developers. This chapter, in fact, leaves
the reader pining for more advanced C# 1 material.

The C# 2 features are very
well covered. The section on generics includes such topics as their
handling by the JIT compiler, the subtleties of type inference, a thorough
discussion on constraints, covariance/contravariance limitations, and
comparisons with Java’s generics and C++’s templates. Nullable types
are covered similarly well, with suggested patterns of use, as are anonymous
methods and iterators.

The C# 3 features are also
handled well. I like how Jon introduces expression trees—first building
them programmatically, and then showing how the compiler provides a
shortcut via lambda expressions. The book covers query expressions and
the basics of LINQ, and includes a brief explanation of each of the
standard query operators in an appendix. There’s also a chapter called
“LINQ Beyond Collections” which briefly introduces the LINQ to SQL,
LINQ to DataSet and LINQ to XML APIs.

Throughout the book, Jon goes
to some lengths to explain not just “what”, but “why”. This
book isn’t for people who want to get in and out quick so they can
get their job done and out of the way—it’s for people who enjoy
working elegantly with their tools, through a rich understanding of
the language’s background, subtleties and nuances.

Of course, digesting all this
is a bit of work (Chapter 3’s summary opens with the word “Phew!”).
Despite this, I think Jon does a good job at explaining difficult things
well. I don’t think I’ve seen any IL listings in the book, which is
a good sign in general. I’m always wary when an author, in explaining
a C# concept, says, “to understand XYZ, we must examine the IL”.
I take issue with this: rarely, if ever, does one need to look at IL
to understand C#, and doing so creates unnecessary complication by choosing
the wrong level of abstraction. That isn’t to saying looking at IL isn’t
useful for a deeper understanding of the CLR—but only after first
teaching C# concepts independently of IL.

What’s Missing

It was in Jon’s design critieria
not build a tome—instead to write a small(ish) book that complements
rather than replaces books such as C# 3.0 in a Nutshell. Most things
missing from C# in Depth are consistent with its focus (such as the
CLR, threading, .NET Framework, etc.) The fact that C# in Depth excludes
the features of C# that were introduced prior to version 2 is a good
thing if you’re looking for a “delta” book, although, of course,
it makes it less useful as a language reference.

The book’s treatment of LINQ
centres largely on LINQ to Objects. If you’re planning on learning
C# 3.0 so that you can query databases through LINQ, the book’s focus
is not ideal, if read in isolation. I personally prefer the approach
of covering “remote” query architecture earlier and in more detail
(in conjunction with the canonical API, LINQ to SQL) – so that when
it comes time to teach query operators such as SelectMany, Group and
Join, they can be demonstrated in the context of both local and database
queries. I also strive, when writing on LINQ, to cover enough querying
ground that readers can “reproduce” their SQL queries in LINQ—even
though it means having to get sidetracked with API practicalities. Of
course, getting sidetracked with API practicalities is undesirable for
a language-centric book such as C# in Depth, and so the LINQ to Objects
focus is understandable. In any case, reading Chapters 8-10 of C# 3.0
in a Nutshell would certainly fill in the gaps. Another complementary
book would be Manning’s LINQ in Action (this book is well-reviewed
on Amazon, though I’ve not yet read it).

Summary

This book is well written,
accurate and insightful, and complements nearly every other book out
there. I would recommend it to anyone wanting a thorough “inside”
tutorial on the features new to C# 2 and 3.