The CLI memory model, and specific specifications

A while ago, I was directed to a disturbing (in my view) post on GrantRi’s blog, to do with the .NET memory model. I’m slightly less disturbed having read his clarification, but there’s a fairly deep problem here. Here’s part of a sample class:

string name;
    
public void WriteNameLength()
{
    string localName = name;
    if (localName!=null)
    {
        Console.WriteLine (localName.Length);
    }
}

Now, other threads may be changing the value of name all over the place, and there’s an issue in WriteNameLength in terms of whether or not it shows the “latest” value, but my question is: can the above throw a NullReferenceException?

It looks like it can’t, because even if name becomes null during the method, surely the value of localName can’t change – it’s either null or it’s not, and we don’t try to dereference it if it’s null.

Unfortunately, it looks from Grant’s blog post as if a JIT should be free to treat the above as:

public void WriteNameLength()
{
    if (name!=null)
    {
        Console.WriteLine (name.Length);
    }
}

Now the above clearly can throw an exception, if name becomes null in another thread after the “if” and before the dereference (and if that change is noticed by the thread running WriteNameLength).

This surprises me – just as it surprised lots of people reading Grant’s blog. It surprised me so much that I checked the CLI specification, and couldn’t work out whether it was correct or not. This is even more worrying – so I mailed Grant, and his (very speedy) reply was along the lines of “I’m not an expert, but it looks to me like the spec is too vague to say for sure whether this is legitimate or not.” (I apologise if I’ve misrepresented the reply – in some ways it doesn’t matter though.)

When trying to write performant, reliable systems, it is surely crucial to have a memory model specification which can be reasoned with. The Java memory model was reasonably well defined before 1.5, and then (after years of detailed discussion) it was updated in way which I believe was designed to give backward compatibility but lay out very clear rules. Surely the CLI deserves a specification with a similar level of detail – one which both JIT developers and application developers can use to make sure that there are no surprises amongst informed individuals. (There will always be people who write multi-threaded programs while remaining blissfully unaware of the importance of a memory model. It’s very hard to cater for them without crippling JIT optimisation, effectively synchronising all the time. I’m not too worried about that.)

Usually, when I’m writing multi-threaded code, I err on the side of caution – I tend to use locks when I could get away with volatile variables, for instance, just because I need to think slightly less hard to make sure everything’s correct. There are people for whom that’s not just good enough – their performance requirements make every context switch, every locking operation, every optimisation restriction valuable enough to really need to know the details of the memory model. There should be an effort on the part of MS and/or the ECMA committee to clearly and specifically define what the CLI memory model does and doesn’t guarantee. I doubt that anyone reading this blog is in a position to instigate such an effort – but
if you are, please give it careful consideration.

Unit tests rock, I suck, news at 11

I’ve just started looking at my Miscellaneous Utility Library again, after quite a while. I’m currently running Vista on my laptop, which means I can’t run Visual Studio 2003 – so it’s about time I updated the library to use generics and all that goodness. I’ll keep the .NET 1.1 version available on the web site, but from now on any new code will be 2.0 only.

In the process of updating RandomAccessQueue to use implement the generic collection interfaces, I decided to do the implementation test-first, as is now my habit. It clearly wasn’t habit back when I originally wrote the code (the same day Peramon laid everyone off, incidentally – I remember as I was at home, ill). The new methods use some of the old methods – and unfortunately that’s now exposed some long-standing bugs.

Looking back, I find it hard to understand why I had so much faith in this code: it’s the kind of code which is bound to suffer from off-by-one errors and the like. It’s not terribly hard to test, fortunately (unlike the threadpool stuff, for example). Oh how I wish I’d been using NUnit back then.

This happened the last time I looked at a MiscUtil class, too. It will take a while to add unit tests giving a decent level of coverage to the code – it’s not like I have a lot of spare time – but it’s clearly got to be done.

I wonder how much other code I’ve written over the years is riddled with bugs? To be fair, the MiscUtil stuff was run considerably less than most of the code I wrote professionally at Peramon… but I bet there were quite a few nasty little gotchas waiting there too. And now?
No, I don’t write perfect code, even with unit tests. Even when the code does what I intend it to do, I have to revisit whether the intention was right in the first place. Even with unit tests there can easily be problems which slip through the cracks – but I don’t think I produce nearly as much code which is basically broken as I used to.