Language design, when is a language “done”, and why does it matter?

As per previous posts, I’ve been thinking a fair amount about how much it’s reasonable to keep progressing a language. Not only have thoughts about C# 4 provoked this, but also a few other sources:

The video is very well worth watching in its entirety – even though I wouldn’t pretend to understand everything in it. (It’s worth watching rather than just listening to, by the way – Gilad’s body language is very telling.) Here are just a few of the things which particularly caught my attention:

  • Mads, 14:45 on the “Babelification” which could occur if everyone plugs in their own type system into a pluggable language. This is similar to my concern about LISP-like macros in C#, I think.
  • Gilad, 23:40: “We’re the kind of people who love to learn new things. Most people hate to learn new things.” I don’t agree with that – but I’d say that people hate to feel they’re on a treadmill where they spent all their time learning, with no chance to actually use what they’ve learned.
  • Gilad, 28:35: “People never know when to stop […] What happens is they do too much.”
  • Mads, 50:50: “The perfect language is the one that helps you do your task well, and that varies from task to task.”

So, what does this have to do with C#? Well, I was wondering how different people would respond when asked if C# was “done”. Eric’s certainly remarked that it’s nowhere near done – whereas prior to C# 3, I think I’d have called C# 2 “done”. C# 3 has opened my eyes a little about what might be possible – how radical changes can be made while still keeping a coherent language.

I’ve been worrying publicly about the burden of learning which is being placed on developers. I’m starting to change my thoughts now (yes, even since yesterday). I’ve started to wonder where the burden is coming from, and why it matters if C# changes even more radically in the future.

Who would make you learn or use C# 4?

Suppose the C# team went nuts, and decided that C# 4 would include:

  • x86 inline assembly
  • Optional reverse Polish notation, which could be mixed and matched with the existing syntax
  • Checked exceptions
  • Regular expressions as a language feature, but using a new and slightly different regex dialect
  • User-defined operators (so you could define the “treble clef” operator, should you wish to)
  • Making semi-colons optional, but whitespace significant. (Heck, we could remove braces at the same time – optionally.)
  • A scripting mode, where Console.WriteLine(“Hello”) would count as a complete program

I’m assuming that most readers wouldn’t want to use or even learn such a language. Would you do it anyway though? Bear it in mind.

Now suppose the C# team worked out ways of including significant pieces of obscure but powerful computer science into C# 4 instead. Lots to learn, but with great rewards. It’s backwardly compatible, but idiomatic C# 4 looks totally different to C# 3.

Here’s the odd thing: I’d be more comfortable with the first scenario than the second. Why? Because the first lets me get on with developing software, guilt-free. There’d be no pressure to learn a lunatic version of C# 4, whereas if it’s reasonably compelling I’ll have to find the time. It’s unlikely (in most companies anyway) that I’ll be given the time by my employers – there might be a training course if I’m lucky, but we all know that’s not really how you learn to use a language productively. You learn it by playing and experimenting in conjunction with the more theoretical training or reading. I like to learn new things, but I’m already several technologies behind.

What’s in a name?

Now consider exactly the same scenario, but where instead of “C# 4” the language is named “Gronk#”. In both cases it’s still backwardly compatible with C# 3.

Logically, the name of the language should make no difference whatsoever. But it does. As a C# developer, I feel an obligation (both personal and from my employer) to keep up with C#. If you’re a C# developer who isn’t at least looking at C# 3 at the moment, you’re likely to find yourself behind the field. Compare that with F#. I’m interested to learn F# properly, and I really will get round to it some time – but I feel no commercial pressure to do so. I’m sure that learning a functional language would benefit many developers – as much (or even more) for the gains in perspective when writing C# 3 as for the likelihood of using the functional language directly in a commercial setting. But hey, it’s not C# so there’s no assumption that it’s on my radar. Indeed, I suspect that if I polled my colleagues, many wouldn’t have even heard of F#. They’re good engineers, but they have a home life which doesn’t involve obsessing over computer languages (yeah, I find it hard to believe too), and at work we’re busy building products.

We could potentially have more “freedom” if every language release came with a completely different name. It would happen to be able to build the old code, but that could seem almost incidental. (It would also potentially give more room for breaking changes, but that’s a very different matter.) There’d be another potential outcome – branching.

Consider the changes I’ve proposed for C# 4. They are mere tweaks. They keep the language headed in the same direction, but with a few minor bumps removed. Let’s call this Jon#.

Now consider a language which (say) Erik Meijer might build as the successor to C# 3. I’m sure there are plenty of features from Haskell which C# doesn’t have yet. Let’s suppose Erik decides to bundle them all into Erik#. (For what it’s worth, I don’t for one moment believe that Erik would actually treat C# insensitively. I have a great respect for him, even if I don’t always understand everything he says.)

Jon# and Erik# can be independent. There’s no need for Erik# to contain the changes of Jon# if they don’t fit in with the bigger picture. Conservative developers can learn Jon# and make their lives a bit easier for little investment. Radical free thinkers can learn Erik# in the hope that it can give them really big rewards in the long run. Everyone’s happy. Innovation and pragmatism both win.

Well, sort of.

We’ve then got two language specs, two compilers, two IDE experiences, etc. That hurts. Branching gives some freedom at the cost of maintenance – as much here as in source control.

Where do we go from here?

This has been a meandering post, which is partly due to the circumstances in which I’ve written it, and partly due to the inconclusive nature of my thoughts on the matter. I guess some of the main points are:

  • Names matter – not just in terms of getting attention, but in the burden of expected learning as well.
  • Contrary to impressions I may have given before, I really don’t want to be a curmudgeonly stifler of language innovation. I just worry about unintended effects which are more to do with day to day human reality than technical achievement.
  • There are always options and associated costs – branching being one option which gives freedom at a high price

I really don’t have a good conclusion here – but I hope plenty of people will spare me their thoughts on this slightly non-technical matter as readily as they have about specific C# 4 features.

8 thoughts on “Language design, when is a language “done”, and why does it matter?”

  1. This post really made me think about languages in a way that I never did before, thanks Jon.

    I do agree with you, the name is very important. When i think about C#, VB.NET, C++, Java, PERL, etc, it really does carry a unique sense of what I expect from each. If all of a sudden major changes started happening to C# it would really be a shock to my system.

    As for when is a language finished? It never is, not until its dead. Its like any program we build, its never really finished until every last programmer leaves the project. Languages are after all just another program someone wrote :)

    Like

  2. Surely in a frame work environment that we find ourselves in changes to the language should be less pronounced and much more focused on the .NET framework itself.

    Like

  3. Well, look at the changes in C# 3. They were rather pronounced. Often framework features only become really useful when the languages support them. LINQ would have been close to useless without lambda expressions.

    Like

  4. If C# didn’t evolve more than it has since C# 2 and the software development industry continued to evolve, it’s likely that only a new language would be able to properly allow users’ to complete their tasks. That being the case, they’d have to learn an entirely new language anyway. While learning a more evolved C# may also mean more learning, this is a side-effect of the industry so I don’t think learning consequences should be a deterrent to evolving the language.

    I think it’s pretty clear what happens when a language doesn’t evolve. There may be languages with less flexible fundamentals that make them hard to evolve; but they all must evolve or simply stop being used. Oddly one complaint about C# that I’ve heard is that it doesn’t evolve fast enough; but I’d say it’s one of the fastest evolving languages (at least at year 7-8).

    Like

  5. When is a language done? When a better language takes its place ;)

    The English ‘language’, by comparison, changes with the times – new words and slang appear in order to describe new concepts or to describe old concepts in different ways. This allows us to communicate our thoughts more succinctly, and in the case of new concepts, at all.

    As business, and therefore software requirements change, programming languages must adapt. If they don’t, they die.

    C#, it seems to me, is simply adapting by utilizing language-based tools that have already been tested in the industry (e.g., functional programming) and are now being applied in a different (broader?) context. I expect C# 4 to provide tools that parallel whatever new, emerging technology may appear in the future (or at least, as in the case of C# 3, new to the .NET Framework).

    Like

  6. “C# 3 has opened my eyes a little about what might be possible – how radical changes can be made while still keeping a coherent language.”

    Interesting. While everything in C# 2.0 was a natural extension from 1.0 (even generics, although maybe I think that because I was used to C++ templates), in my mind, LINQ made C# considerably less coherent. Which isn’t to say it wasn’t worth doing (or that it was; my personal jury is still out on that).

    I think there is a somewhat different question which could be equally as interesting: at what point has a language passed a reasonable barrier for entry for new developers? Those of us who have been using C# since 1.0 may be able to handle the incremental changes introduced with each new version; but for a developer trying to learn it for the first time, especially a young and/or amateur developer, I think C# 3.0 has already passed the point at which it can be learned to a significantly productive level in a reasonable amount of time (recognizing that those are subjective terms).

    Like

  7. David: I disagree. Let me (attempt to) explain why.

    I see LINQ (accurately, I think) as the natural progression of a multi-version project to solve the data query problem. It has *taken* several versions to get here because it is a *very* difficult problem. It required (grouping technologies that are simple augmentations):

    – delegates (C#1 + CLR1)
    – anonymous delegates (C#2)
    – lambdas (C#3 + NetFX 3.5’s Expression classes)

    – generics (C#2 + CLR2)

    – type inference (C#2)
    – “var” type inference (C#3)

    – extension methods (C#3)

    I wonder if you’ve ever taken a look at C-omega, which was pretty much the last public preview of what was eventually announced at PDC05? That was shown off long before Whidbey was even out the door: LINQ was a long way coming, and needed to be done in stages.

    Nowadays, I’m not sure I consider pre-C#3 releases to be any more than high-quality betas, because in hindsight it’s first true objective hadn’t been solved until LINQ. It’s only unfortunate that we have to live with some artifacts of the intermediate versions (non-lambda delegate syntax, etc) but, thankfully, we rarely have to use them.

    Like

Leave a comment