The value of a language specification

Last Friday evening was the inaugural Reading Geek Night. More about that another time, but if you’re in or around Reading in the UK, and fancy meeting up with some smart people (and me) to discuss software in various shapes and forms, let me know.

After most people had gone home, a few of us including Stuart Caborn were talking about specs. Stuart remembers how I ended up writing some annotations in the C# Annotated Standard: we were debugging some code, and I noticed an unboxing conversion which was unboxing an enum as an int. It worked, but I was surprised. I consulted the spec, and found that according to the spec it really shouldn’t have worked. (Furthermore, the spec suggested a case which couldn’t possibly be valid. I can’t remember the details now, but I can dig them up if anyone was interested.) I’d had one or two conversations with Jon Jagger (the C# ECMA Task Group convenor at the time) before, so I mailed him. Jon invited me to join in the book project, and I took to it with gusto. I reading most of C# 2 ECMA spec over the course of a few weeks, writing annotations as I went along.

This is not what most people would consider normal behaviour. When I recently gave a talk about C# 3, I was delighted to hear someone else mention that they had checked the spec about some aspect of the language. Finally, I wasn’t alone! However, such people are clearly the exception rather than the rule.

I genuinely don’t think that matters too much. I really don’t expect many developers to read the spec – certainly not thoroughly. I think it’s important to know that there is a spec, and be able to consult it when in doubt. I want to be able to know what every line of code is doing, in terms of which variable it’s going to access, which method it’s going to call, the order of execution of a post-increment as a method argument, etc.

That’s not to say I actually learn all of the rules by rote – even for something as simple as operator precedence, I sometimes put brackets in when they’re not required, for example. I’d rather not rely on me or a maintenance engineer having to remember too many details. But if someone else has written some obscure line of code, I’m pretty confident that I’ll be able to understand it with the help of the spec, and refactor it into something more readable.

Now, Stuart challenged the value of the spec. If his code was misbehaving he wouldn’t consult the spec – he’d consult either books or (more likely) the unit tests. Realistically, the vast majority of C# is being compiled by the Microsoft compiler, so the idea of having a spec available for other implementations isn’t actually important to that many developers in terms of business. (It may be psychologically and politically important, and I’m not trying to knock the great work that the Mono project has done – but I’ve never used Mono professionally, and I suspect that’s the case for most people.) Either the code works or it doesn’t, and if it doesn’t work the tests should say so.

I counter that not having a spec is like not having documentation for a library – if you start relying on unspecified behaviour, you can come unstuck when that behaviour changes in a legitimate way. A good example of this is depending on a particular hash algorithm being used for GetHashCode; the algorithm for string.GetHashCode() changed between .NET 1.1 and 2.0, and I’ve seen a few people get burned, having stored the generated hash values in a database. Suddenly nothing matches any more… because they ignored what the documentation said.

Stuart’s response: if the tests still work, the changes haven’t broken anything. If the tests don’t work, we can go and fix the code so they start to work again. I’ll concede that it’s unlikely that implementation changes in the compiler will actually break any code (and it’s also very unlikely that specification changes will break code – the C# design team are pretty fanatical about not introducing breaking changes).

I can see Stuart’s point of view, but it just feels so very wrong. I suspect a lot of that is down to my personality type – how I really hate working without enough information (or what I consider to be enough information). Today I fixed a bug with an ASP.NET application which was producing incorrect JavaScript. It was working on some machines and not working on others. I thought I’d found out why (a different version of a library in the GAC) but that was ruled out after examining another machine which was working contrary to my hypothesis. I’m reasonably confident that my fix will work, but I really don’t like the fact that I don’t understand the issue in the first place. It’s very hard to piece together the necessary information – which is like working on a language that doesn’t have a spec.

Eventually, I came up with an answer which I think Stuart more or less accepted. I’m in one of the groups the spec is aimed at. I write about C#, hoping to explain it to other people. One of my aims with C# in Depth is to give enough information to make the spec even more irrelevant to most developers when it comes to the changes in C# 2 and 3. Without wishing to denigrate existing C# books too much, I’ve often found that the kind of details which I wanted to investigate further just weren’t covered in the books – to get the answer, I had to go to the spec. I really hope that if I’d had my own book, I’d have been able to consult that for most of those issues. However, I simply couldn’t have written the book without the spec.

I’ve had experience of writing about a language without a spec. When I was helping out with Groovy in Action, I often found myself frustrated by the fact that the Groovy spec is far from finished. This shouldn’t be surprising to anyone – Microsoft have a significant team of really smart people who are paid to immerse themselves thoroughly in C# and make sure the language is all it can be, in terms of design, documentation and implementation. Designing a language well is hard – I haven’t been part of designing any languages, but I can get some idea of the difficulty based on what I’ve seen of the languages I’ve used. The loving care required to make sure that all the behaviour that should be pinned down is indeed described, while leaving rigidly defined areas of doubt and uncertainty where that’s appropriate, must be phenomenal. I don’t doubt that the Groovy team is talented, but coming up with a good spec is probably too much of a resource drain, unfortunately.

I haven’t covered everything I feel about specs in this post, but I’m going to finish now before I officially begin to ramble. Apologies to Stuart if I’ve misrepresented his views – and I should point out that this was late at night after Stuart had a few beers, which may be relevant. In short then (and including points I haven’t gone into):

  • The existence of a specification is important, even if it’s not consulted by every developer. Even if I were never ill, I’d be glad that the National Health Service existed.
  • I’d be very worried if the language/compiler team itself didn’t have a good spec, and if they do, there’s no reason to hide it. As an example of how important this is, just read Martin Fowler writing about JRuby/IronRuby: “Soon-to-be-ThoughtWorker Ola Bini, a JRuby committer, reckons that it’s almost impossible to figure out how to implement a Ruby runtime without looking at source code of the MRI.” That screams to me of “the implementation is the documentation” which I regard as a very unhealthy state of affairs.
  • Specifications are vital for authors (whether of books or web articles) who need to present accurate information based on more than just the current behaviour.
  • Sometimes you can trust a specification more than tests – with a memory model spec, it’s possible to reason about whether or not my code is thread-safe. It could pass all tests but still not handle a bizarre race condition. (Of course, a better memory model spec for .NET would be welcome.)
  • Unit tests are never going to catch every flaw. They can give you a great deal of confidence, but not certainty. (Example: how many people explicitly check that their text handling code will work just as well when provided with non-interned strings, rather than strings which were originally specified as literals? If the string interning behaviour changes in a valid way, are you absolutely sure your code won’t fail?)
  • I’m on the fence about the value of having the ECMA spec as well as the Microsoft one. I can see how it could be important in certain business situations – but as a developer, I don’t care that much. I’ve had very few qualms about changing my standard reference from ECMA C# 2 to MS C# 3. It’s unclear to me (as someone completely outside the process) how much influence ECMA has at this stage on the design of the language itself. Were ECMA committee members explicitly consulted during the C# 3 design process? Clearly making a significant change to the language now would be likely to make all existing compilers “broken” – so what can the ECMA team do beyond reframing the existing rules? As I say, I’m an outside in this matter, so I can’t really judge – but I think it’s a valid question to ask.

Anyway, that’s about a sermon’s-worth of preaching about specifications – time for bed.

One thought on “The value of a language specification”

  1. This may sound wishy-washy, but you’re both right, in the sense that value is in the eye of the beholder and you’re two different beholders.

    A solid spec is necessary for those who care about whether it is right according to some intellectual truth. When you’re going to have more than one implementation, you need it. When you have enough IQ-units looking to leverage the language, you need it. But people will struggle along without it, and MS hasn’t particularly cared at times. The MS ecosystem seems more willing to go with implementation than spec than some other ecosystems. (Remember when Borland added an *optional* keyword to their C++ syntax and got roundly flamed for it? (That was too extreme in my opinion.))

    I was immersed in the MDX language spec from its beta 2 back in 1997 or so, and it was (and now is) worth something but not that much. The spec, as published on MSDN even today, is incomplete and internally inconsistent. Microsoft did not implement the spec as written- the deviations weren’t too gross, but if you own the spec, why deviate? The other vendors that implemented MDX used the spec as a kind of starting point, but with few exceptions implemented what Microsoft did, rather than said, because that’s all anyone would use. Practicality trumped intellect (and, as far as I can tell, still does, though SQL Server 2005 got a lot closer than 7 or 2000).

    In summary, if there’s a good spec, it’s only as valuable as the diligence and faith in using it all around.

    Like

Leave a comment