Human LINQ

Last night I gave a talk about C# 3 and LINQ, organised by Iterative Training and NxtGenUG. I attempted to cover all the features of C# 3 and the basics of LINQ in about an hour and a half or so. It’s quite a brutal challenge, and obviously I wasn’t able to go into much detail about anything. It went down reasonably well, but I can’t help feeling there’s a lot of room for improvement. That said, there was one part of the talk which really did go well, and made the appropriate points effectively.

I had demonstrated the following LINQ query in code:

using System;
using System.Linq;

class Test
    static void Main(string[] args)
        var words = new[] {“the”, “quick”, “brown”, “fox”,
                “jumped”, “over”, “the”, “lazy”, “dog”};
        var query = from x in words
                    where x.Length > 3
                    where x[0] != ‘q’
                    select x.ToUpper();
        foreach (string word in query)

I know it would have been easy to combine the two “where” clauses, but separating them helped with the second part of the demonstration.

In the pizza break, I had prepared sheets of paper – some with the words on (“the”, “quick” etc), and some with clauses (“from x in words”, “where x.Length > 3” etc). I asked for 5 volunteers, who I arranged in a line, facing the rest of the audience. I stood at “audience left” with the sheets of words, gave the next person the “from” clause, then the first “where” clause etc. The person at the far end didn’t have a sheet of paper, but they were acting as the foreach loop.

I suspect you can see what’s coming – we ran possibly the slowest LINQ query in the world. However, we did it reasonably accurately: when the next piece of information was needed, a person would turn to their neighbour and request it; the request would pass up the line until it got to me, whereupon I’d hand over a sheet of paper with a word on. If a “where” clause filtered out a word, they just dropped the piece of paper. When a word reached the far end, the guy shouted it out.

With this, I was able to illustrate:

  • Deferred execution (nothing starts happening until the foreach loop executes)
  • Streaming (only a single word was ever on the move at once)

Next we added an “orderby” clause in just before the end. Sure enough, we then see buffering in action – the guy representing the ordering can’t return any data to the “select” clause until he’s actually got all the data.

Finally, we removed the “orderby” clause again, but added a call to Count(). We didn’t have time to go into a lot of detail, but I think people understood why that led to immediate execution rather than the deferred execution we had earlier.

I suspect I’m not the first person to do something like this, but I’m still really pleased with it. If you’re ever talking about LINQ and people’s eyes are glazing over, it’s a fun little demo. It wasn’t perfect though; there are things I’d change:

  • Put the upper case version of the word on the back of the paper. We had to imagine the result of the projection.
  • Having two “where” clauses is useful for the first demo, but slows things down after that.
  • Possibly use fewer words – it takes quite a while, and having been through it three times, the audience may grow impatient.
  • Explain deferred execution more in terms of the result type – it’ll make it easier to contrast with immediate execution

Overall, it was a really fun night. I did a little interview with Dave McMahon afterwards, which should go up on the NxtGenUG site at some point. I suspect I was talking rather too quickly for the whole time, but we’ll see how it pans out.

Language design, when is a language “done”, and why does it matter?

As per previous posts, I’ve been thinking a fair amount about how much it’s reasonable to keep progressing a language. Not only have thoughts about C# 4 provoked this, but also a few other sources:

The video is very well worth watching in its entirety – even though I wouldn’t pretend to understand everything in it. (It’s worth watching rather than just listening to, by the way – Gilad’s body language is very telling.) Here are just a few of the things which particularly caught my attention:

  • Mads, 14:45 on the “Babelification” which could occur if everyone plugs in their own type system into a pluggable language. This is similar to my concern about LISP-like macros in C#, I think.
  • Gilad, 23:40: “We’re the kind of people who love to learn new things. Most people hate to learn new things.” I don’t agree with that – but I’d say that people hate to feel they’re on a treadmill where they spent all their time learning, with no chance to actually use what they’ve learned.
  • Gilad, 28:35: “People never know when to stop […] What happens is they do too much.”
  • Mads, 50:50: “The perfect language is the one that helps you do your task well, and that varies from task to task.”

So, what does this have to do with C#? Well, I was wondering how different people would respond when asked if C# was “done”. Eric’s certainly remarked that it’s nowhere near done – whereas prior to C# 3, I think I’d have called C# 2 “done”. C# 3 has opened my eyes a little about what might be possible – how radical changes can be made while still keeping a coherent language.

I’ve been worrying publicly about the burden of learning which is being placed on developers. I’m starting to change my thoughts now (yes, even since yesterday). I’ve started to wonder where the burden is coming from, and why it matters if C# changes even more radically in the future.

Who would make you learn or use C# 4?

Suppose the C# team went nuts, and decided that C# 4 would include:

  • x86 inline assembly
  • Optional reverse Polish notation, which could be mixed and matched with the existing syntax
  • Checked exceptions
  • Regular expressions as a language feature, but using a new and slightly different regex dialect
  • User-defined operators (so you could define the “treble clef” operator, should you wish to)
  • Making semi-colons optional, but whitespace significant. (Heck, we could remove braces at the same time – optionally.)
  • A scripting mode, where Console.WriteLine(“Hello”) would count as a complete program

I’m assuming that most readers wouldn’t want to use or even learn such a language. Would you do it anyway though? Bear it in mind.

Now suppose the C# team worked out ways of including significant pieces of obscure but powerful computer science into C# 4 instead. Lots to learn, but with great rewards. It’s backwardly compatible, but idiomatic C# 4 looks totally different to C# 3.

Here’s the odd thing: I’d be more comfortable with the first scenario than the second. Why? Because the first lets me get on with developing software, guilt-free. There’d be no pressure to learn a lunatic version of C# 4, whereas if it’s reasonably compelling I’ll have to find the time. It’s unlikely (in most companies anyway) that I’ll be given the time by my employers – there might be a training course if I’m lucky, but we all know that’s not really how you learn to use a language productively. You learn it by playing and experimenting in conjunction with the more theoretical training or reading. I like to learn new things, but I’m already several technologies behind.

What’s in a name?

Now consider exactly the same scenario, but where instead of “C# 4” the language is named “Gronk#”. In both cases it’s still backwardly compatible with C# 3.

Logically, the name of the language should make no difference whatsoever. But it does. As a C# developer, I feel an obligation (both personal and from my employer) to keep up with C#. If you’re a C# developer who isn’t at least looking at C# 3 at the moment, you’re likely to find yourself behind the field. Compare that with F#. I’m interested to learn F# properly, and I really will get round to it some time – but I feel no commercial pressure to do so. I’m sure that learning a functional language would benefit many developers – as much (or even more) for the gains in perspective when writing C# 3 as for the likelihood of using the functional language directly in a commercial setting. But hey, it’s not C# so there’s no assumption that it’s on my radar. Indeed, I suspect that if I polled my colleagues, many wouldn’t have even heard of F#. They’re good engineers, but they have a home life which doesn’t involve obsessing over computer languages (yeah, I find it hard to believe too), and at work we’re busy building products.

We could potentially have more “freedom” if every language release came with a completely different name. It would happen to be able to build the old code, but that could seem almost incidental. (It would also potentially give more room for breaking changes, but that’s a very different matter.) There’d be another potential outcome – branching.

Consider the changes I’ve proposed for C# 4. They are mere tweaks. They keep the language headed in the same direction, but with a few minor bumps removed. Let’s call this Jon#.

Now consider a language which (say) Erik Meijer might build as the successor to C# 3. I’m sure there are plenty of features from Haskell which C# doesn’t have yet. Let’s suppose Erik decides to bundle them all into Erik#. (For what it’s worth, I don’t for one moment believe that Erik would actually treat C# insensitively. I have a great respect for him, even if I don’t always understand everything he says.)

Jon# and Erik# can be independent. There’s no need for Erik# to contain the changes of Jon# if they don’t fit in with the bigger picture. Conservative developers can learn Jon# and make their lives a bit easier for little investment. Radical free thinkers can learn Erik# in the hope that it can give them really big rewards in the long run. Everyone’s happy. Innovation and pragmatism both win.

Well, sort of.

We’ve then got two language specs, two compilers, two IDE experiences, etc. That hurts. Branching gives some freedom at the cost of maintenance – as much here as in source control.

Where do we go from here?

This has been a meandering post, which is partly due to the circumstances in which I’ve written it, and partly due to the inconclusive nature of my thoughts on the matter. I guess some of the main points are:

  • Names matter – not just in terms of getting attention, but in the burden of expected learning as well.
  • Contrary to impressions I may have given before, I really don’t want to be a curmudgeonly stifler of language innovation. I just worry about unintended effects which are more to do with day to day human reality than technical achievement.
  • There are always options and associated costs – branching being one option which gives freedom at a high price

I really don’t have a good conclusion here – but I hope plenty of people will spare me their thoughts on this slightly non-technical matter as readily as they have about specific C# 4 features.

C# 4, part 4: My manifesto and wishlist

The final part of this little series is the one where I suggest my own ideas for C# 4, beyond those I’ve already indicated my approval for in earlier posts. Before I talk about individual features, however, I’d like to put forward a manifesto which could perhaps help the decision-making process. I hasten to add that I haven’t run all the previous parts through this manifesto to make sure that I’ve been consistent, but all of these thoughts have been running around in my head for a while so I hope I haven’t been wildly out.

Manifesto for C# 4

I would welcome the following goals:

Remember it’s C#

Many suggestions have been trying to turn C# into either Ruby, LISP, or other languages. I welcome diversity in languages, and I believe in using the right tool for the job – but that means languages should stick to their core principles, too. Now, I know that sounds like I might be bashing C# 3, given how much that has borrowed from elsewhere for lambda expressions and the like, and I don’t know exactly how I square that circle internally – but I don’t want C# to become a giant toolbox that every useful feature from every language in existence is dumped into.

There are useful ideas to think about from all kinds of areas – not just existing languages – but I’d be tempted to reject them if they just don’t fit into C# neatly without redefining the whole thing.

Consider how people will learn it

I’ve mentioned this before, but I am truly worried about people learning C# 3 from scratch. One of the reasons I didn’t attempt to write about C# from first principles, instead assuming knowledge of C# 1, is that I’m not sure people can sensibly learn it that way. Now, I don’t think I can sensibly get inside the head of someone who doesn’t know anything about C#, but I suspect that I’d want to cover query expressions right at the very end, preferably after quite a while of experience in C# without them.

That might not go for every new feature – it’s probably worth knowing about automatic properties right from the start, for instance, and introducing lambda expressions at the same time as anonymous methods (if C# 3 is the definite goal) but expression trees would be pretty late in my list of topics.

I learned C# 1 from a background of Java, and it didn’t take long to understand the basics of the syntax. Many of the subtleties took a lot longer of course (and it was a very long time before I really understood the differences between events and delegates, I’m sad to say) but it wasn’t a hard move. For a long time C# 2 just meant generics as far as I was concerned – with occasional nullable types, and some of the simpler features such as differing access for getters and setters. Anonymous methods and iterator blocks didn’t really hit me in terms of usefulness until much later – possibly due to using closures in Groovy and more iterators in LINQ. I suspect for many C# 2 developers this is still the case.

My method of learning C# 3 (from the spec, often in rather more detail than normal for the sake of accuracy in writing) is sufficiently far off the beaten track as to make it irrelevant for general purposes, but I wonder how people will learn it in reality. How will it be taught in universities (assuming there are any that teach C#)? How easily will developers moving to C# from Java cope? How about from other languages?

Interestingly, the move from VB 9 to C# 3 is now probably easier than the move from Java 6 to C# 3. Even with the differences in generics between Java and C#, that probably wasn’t true with C# 2.

To get back to C# 4, I’d like the improvements to be somehow blend in so that learning C# 4 from scratch isn’t a significantly different experience to learning C# 3 from scratch. It’s bound to be slightly longer as there will certainly be new features to learn – but if they can be learned alongside existing features, I think that’s a valuable asset. It’s also worth considering how much investment will be required to learn C# 4 from a position of understanding C# 3. Going from C# 2 to C# 3 is a significant task – but it’s one which involves a paradigm shift, and of course the payoffs are massive. I’d be very surprised (and disappointed) to see the same level of change in C# 4, or indeed the same level of payoff. Conservative though this is, I’m after “quick wins” for developers – even if in some cases such as covariance/contravariance the win is far from quick from the C# design/implementation team’s perspective.

Just to put things into perspective: think about how many new technologies developers have being asked to learn in the last few years – WCF, WPF, Workflow Foundation, Cardspace, Silverlight, AJAX, LINQ, ClickOnce and no doubt other things I’ve forgotten. I feel a bit like Joseph II complaining that Mozart had written “too many notes” – and if you asked me to start exorcising any of these technologies from history I’d be horrified at the prospect. That doesn’t actually make it any easier for us though. I doubt that the pace of change is likely to slow down any time soon in terms of technologies – let’s not make developers feel like complete foreigners in a language they were happy with.

Keep the language spec understandable

I know there aren’t many people who look at the language spec, but the C# spec has historically been very, very well written. It’s usually clear (even if it can be hard to find the right section at times) and leaves rigidly defined areas of doubt and uncertainty in appropriate places, while stamping out ambiguity in others. The new unified spec is a great improvement over the “C# 1 … and then the new bits in C# 2” approach from before. However, it’s growing at a somewhat alarming rate. As it grows, I expect it to become harder to read as a natural consequence – unless particular effort is put into countering that potential problem.

Stay technology-neutral…

Okay, this one is aimed fairly squarely at VB9. I know various people love XML literals, but I’m not a fan. It just feels wrong to have such close ties with a particular technology in the actual language, even one so widely used as XML. (Of course, there’s already a link between XML and C# in terms of documentation comments – but that doens’t feel quite as problematic.)

My first reaction to LINQ (before I understood it) was that C# was being invaded by SQL. Now that I’ve got a much better grasp of query expressions, I have no concern in that area. Perhaps it would be possible to introduce a new hierarchy construct which LINQ to XML understands with ease – or adapt the existing object/collection initializers slightly for this purpose. With some work, it may be possible to do this without restricting it to XML… I’m really just blue-skying though (and this isn’t a feature on my wishlist.)

… but bear concurrency in mind

While I don’t like the idea of tying C# to any particular technology, I think the general theme of concurrency is going to be increasingly important. That’s far from insightful – just about every technology commentator in the world is predicting a massively parallel computing landscape in the future. Developers won’t be able to get away with blissful ignorance of concurrency, even if not everyone will need to know the nuts and bolts.

Make it easier to do the right thing

This is effectively encouraging developers to “fall into the pit of success”. Often best practices are ignored as being inconvenient or impractical at times, and I’m certainly guilty of that myself. C# has a good history of enabling developers to do the right thing more easily as time progresses: automatic properties, iterator blocks and allowing different getter/setter access spring to mind as examples.

In some ways this is the biggest goal in this manifesto. It’s certainly guided me in terms of encouraging mixin and immutability support, ways of simplifying parameter/invariant checking, and potentially IDisposable implementation. I like features which don’t require me to learn whole new ways of approaching problems, but let me do what I already knew I should do, just more easily.

Wishlist of C# 4 features

With all that out of the way, what would I like to see in C# 4? Hopefully from the above you won’t be expecting anything earth-shattering – which is a good job, as all of these are reasonably minor modifications. Perhaps we could call it C# 3.5, shipping with .NET 4.0. That would really make life interesting, as people are already referring to C# 3 as C# 3.5 (and C# 2008)…

Readonly automatic properties

I’ve mentioned this before, but I’ll give more details here. I’d like to be able to specify readonly instead of protected/internal/private for the setter access, which would:

  • Mark the autogenerated backing variable as initonly in IL
  • Prevent code outside the constructor from setting the property

So, for example:


class ReadOnlyDemo
    public string Name { get; readonly set; }

    public ReadOnlyDemo(string name)
        // Valid
        Name = name;
    public void TryToSetName(string newName)
        // Invalid
        Name = newName;

This would make it easier to write genuinely (and verifiably, as per Joe’s post) immutable classes, or just immutable parts of classes. As mentioned in previous comments, there could be interesting challenges around serialization and immutability, but frankly they really need to be addressed anyway – immutability is going to be one part of the toolkit for concurrency, whether it has language support or not. In the generated IL the property would only have a getter – calls to the setter in the constructor would be translated into direct variable sets.

This shouldn’t require a CLR change.

Property-scoped variables

I’ve been suggesting this (occasionally) for a long time, but it’s worth reiterating. Every so often, you really want to make sure that no code messes around with a variable other than through a property. This can be solved with discipline, of course – but historically we don’t have a good record on sticking to discipline. Why not get the compiler to enforce the discipline? I would consider code like this:


public class Person
    public int Age
        // Variable is not accessible outside the Age property
        int age;
            return age;
            if (value < 0 || value > SomeMaxAgeConstant)
                throw new ArgumentOutOfRangeException();
            age = value;

    public void SetAgeNicely(int value)
        // Perfectly legal
        Age = value;

    public void SetAgeSneakily(int value)
        // Compilation error!
        age = value;


Just in case Eric’s reading this: yes, having Age as a property of a person is a generally bad idea. Specifying a date of birth and calculating the age is a better idea. Really, don’t use this code as a model for a Person type. However, treat it as a dumb example of a reasonable idea. I need to find myself a better type to use as my first port of call when finding an example…

The variable name would still have to be unique – it would still be the name generated in the IL, for instance. Multiple variables could be declared if required. The generated code could be exactly the same as that of existing code which happened to only use the property to access the variable.

A couple of potential options:

  • The variables could be directly accessible during the constructor, potentially. This would help with things like serialization.
  • Likewise, potentially an attribute could be applied to other members which needed access to the variables. Bear in mind that we’re only trying to save developers from themselves (and their colleagues). We’re not trying to cope with intruders in a security sense. An active “I know I’m violating my own rules” declaration should cause enough discomfort to avoid the accidental issues we’re trying to avoid.

This shouldn’t require a CLR change.

Extension properties

This has been broadly talked about, particularly in view of fluent interfaces. It feels to me that there are two very different reasons for extension properties:

  1. Making fluent interfaces prettier, e.g. 19.June(1976) + 8.Hours + 20.Minutes instead of 19.June(1976) + 8.Hours() + 20.Minutes()
  2. Genuine properties, which of course couldn’t add new state to the extended type, but could access it in a different way.

Point 1 feels like a bit of a hack, I must admit. It’s using properties not because the result is “property-like” but because we want to miss the brackets off. It’s been pointed out to me that VB already allows this, and that by brackets to be missed out for parameterless methods we could achieve the same effect – but that just feels wrong. Arguably fluent interfaces already mess around with the normal conventions of what methods do and how they’re named, so using properties like this probably isn’t too bad.

Point 2 is a more natural reason for extension properties. As an example, consider a type which exposes a Size property, but not Width or Height. Changing either dimension individually requires setting the Size to a new one with the same value for the other dimension – this is often much harder to read than judicious use of Height/Width. I suspect that extension properties would actually be used for this reason less often than for fluent interfaces, but there may be any number of interesting uses I haven’t thought of.

This shouldn’t require a CLR change, but framework changes may be required.

Extension method discovery improvements

I’ve made it clear before now that the way extension methods are discovered (i.e. with using directives which import all the extension methods of all the types within the specified namespace) leaves much to be desired. I don’t like trying to reverse bad decisions – it’s pretty hard to do it well – but I really feel strongly about this one. (Interestingly, although I’ve heard many people criticising this choice, I don’t actually remember hearing the C# team defending it. Given that reservations were raised back in 2005, when there was still plenty of time to change stuff, I suspect there are reasons no-one’s thought of. I’d love to hear them some time.)

The goal would be to change from discovering extensions at a namespace to discovering extensions at a type level. (By which I mean at a “type containing extension methods” level – e.g. System.Linq.Enumerable or System.Linq.Queryably. Admittedly discovery on a basis which explicitly specifies the type to extend would also be interesting.) I don’t mind exactly how the syntax works, but the usual ideas are ones such as:


static using System.Linq.Enumerable;
using static System.Linq.Enumerable;
using class System.Linq.Enumerable;


That’s the easy part – the harder part would be working out the best way to phase out the “old” syntax. I would suggest a warning if extension methods are found and used without being explicitly mentioned by type. In C# 4 this could be a warning which was disabled by default (but could be enabled with pragmas or command line switches), then enabled by default in C# 5 (but with the same options available, this time to disable it). By C# 6 we could perhaps remove the ability to discover extension methods by namespace altogether, so the methods just wouldn’t be found any more.

The C# team could be more aggressive than this, perhaps skipping the first step and making it an enabled warning from C# 4 – but I’m happy to leave that kind of thing to them, without paying it much more attention. I know how seriously they take breaking changes.

No CLR changes required as far as I can see.

Implicit “select” at end of query expressions

I can’t say I’ve used VB9’s LINQ support, but I’ve heard about one aspect which has some appeal. In C# 3, every query expression ends with either “select” or “groupby”. The compiler is actually smart enough to ignore a redundant select clause (except for degenerate query expressions ) and indeed the language spec makes this clear. So why require it in the query expression in the first place? As a concrete example of before/after:


// Before
var query = from user in db.Users
            where user.Age > 18
            orderby user.Name
            select user;

// After
var query = from user in db.Users
            where user.Age > 18
            orderby user.Name;


This isn’t a massive deal, but it would be quite neat. I worry slightly that there could be significant costs in terms of the specification complexity, however.

Internal members on internal interfaces

Interfaces currently only ever have public members, even if the interfaces themselves are internal. This means that implementing an internal interface in an internal class still means making the implementing method public, or using explicit interface implementation (which imposes other restrictions, particularly in terms of overriding). It would be nice to be able to make members internal when the interface itself is internal – either explicitly or implicitly. Implementing such members publicly would still be allowed, but you could choose to keep the implementation internal if desired.

This may require a CLR change – not sure.

“Namespace+assembly” access restriction

It’s not an uncommon request on the C# newsgroup for the equivalent of C++’s “friend” feature – where two classes have a special relationship. In many ways InternalsVisibleTo is an assembly-wide version of this feature, but I can certainly see how it would be nice to have a slightly finer grained version. Sometimes two classes are naturally tightly coupled, even though they have distinct responsibilities. Although loose coupling is generally accepted to be a good thing, it’s not always practical. At the same time, giving extra access to all the types within the same assembly can be a little bit much.

Instead of specifying particular types to share members with, I’d propose a new access level, which would make appropriately decorated members available to other types which are both within the same assembly and within the same namespace. This would be similar to Java’s “package level” access (the default, for some reason) except without the implicit availability to derived types. (Java’s access levels and defaults are odd to say the least.)

(Of course, this wouldn’t help in assemblies which consisted of types within a single namespace.)

This would almost certainly require a CLR change.

InternalsVisibleTo simplification for strongly named assemblies

This one’s just a little niggle. In order to use the InternalsVisibleToAttribute to refer to a strongly named assembly (which you almost always have to do if the declaring assembly is strongly named), you have to specify the public key. Not the public key token as the documentation claims, but the whole public key. Not only that, but you can’t have any whitespace in it – so you can’t use a verbatim string literal to easily put it in a block. Instead, you either have to have the whole thing on one line, or use compile-time string concatenation to make sure the key is still unbroken.

It’s not often you need to look at the assembly attributes, so it’s far from a major issue – but it’s a mild annoyance which could be fixed with very few downsides.

This may require a CLR change – not sure.

Is that all?

I suspect that soon after posting this, I’ll think of other ideas. Some may be daft, some may be more significant than these, but either way I’ll do a new post for new ideas, rather than adding to this one. I’ll update this one for typos, further explanations etc. I suspect if I don’t post this now I’ll keep tweaking it for hours – which is relatively pointless as I’m really trying to provoke discussion rather than presenting polished specification proposals.

Macros, and languages within languages

Ian Griffiths mailed me about macros, and explained how LISP macros were very different to C/C++ macros, working at a language level instead of at a text level. I won’t pretend to understand all about what would be possible and what wouldn’t, but Ian gave a good example: query expressions in C# 3. Instead of being part of the language itself, they could apparently have been written as macros, if C# supported them. Then if you wanted to have similar support for different forms of expression, you could just write your own macro library.

Assuming that’s what people are actually requesting, I can certainly see the attraction – but I’d still prefer it if C# didn’t go down that route. I’ll go back to C++ for the guts of the reason why, but it’s not really about macros at this point. It’s about building your own language. Once, someone told me that C++ wasn’t a language – it was a meta-language; no-one used “bare” C++, they worked out their own language made from the building blocks of normal C++, and then used that.

That may or may not be true – or more likely, it’s true in some places but not others – but it scares me as an idea. I’m not going to claim I know every nuance of C#, but it’s pretty rare that you’d throw a line at me without it being reasonably clear what’s going to happen and why, at the language level. Extension methods might mean a bit more information is required as to where a particular method comes from, but it doesn’t take a lot of digging to see what’s going on.

Now imagine that C# 3 didn’t include query expressions, but that someone had come up with them as a macro library. It’s not an insignificant amount of effort to learn what’s going on there, and how it all maps to normal method calls, potentially with expression trees as arguments instead of delegates. Until you understand what’s going on at a reasonably deep level, you can’t really make any firm decisions as to what code including a query expression will do. (Heck, that’s one of the premises of the book: you should really know this stuff, or at least be aware of it.)

That’s fine when there’s a single macro library used globally, but now imagine every company has their own – or worse still, has a bunch of them grabbed from Code Project, possibly including a load of bugs. Most of us aren’t accomplished language designers, and I suspect there’d be an awful lot of macro libraries out there which weren’t quite thought through enough – but were still useful enough to be attractive. They’d become magnets for code warts.

It’s hard enough when you change company to work out what 3rd party libraries are in use, how they’re being used, what the coding conventions are etc. It would be much worse if I had to learn another flavour of C# itself each time. I’m already worried that developers are picking up C# 3 without having a firm enough grasp of C# 2 – and that’s when there’s just a progression within a single language.

I know this all sounds patronising and/or elitest and/or “nanny state” applied to programming languages – but it’s how I feel nonetheless. I just don’t think we (as a development community) are mature enough to handle that sort of power without it turning into a blood bath. This sort of thing sounds fabulous for hobby and research development, and would probably be great in the hands of the best few companies in the world – but I don’t think it’s a good idea for the mainstream.

Okay – time to hear why I’m wrong :)

C# 4, part 3: Ideas from Microsoft

Microsoft haven’t committed to anything in C# 4 yet. However, there have been hints about what they’ve been considering in Eric Lippert’s blog, and more than hints in Charlie Calvert’s blog. There’s not a lot to go on yet, but:

Immutability support

Most of Eric’s posts about immutability have so far been about immutable data structures. However, the first post in the series did mention that they’re playing around with immutability from the point of view of potential language support. Joe Duffy also wrote about immutability at roughly the same time.

What can we expect in terms of support? Possibilities:

  • Compiler checking via attributes, as per Joe’s posts
  • Immutable collections to make it easier to sensibly embed collections within immutable types
  • Readonly automatic properties (as I mentioned in part 1 – I’ll expand on this in the next post)

I’m not sure what could be usefully added beyond those. One possibility, however: what about automatic equality and hashcode generation? I touched on this last time when talking about “instant data types” but I don’t see why it shouldn’t be applicable in general. After all:

  • Immutable types are good candidates for dictionary keys – or to put it the other way round, using mutable types as dictionary keys is a risky idea
  • If all immutable types automatically override GetHashCode and Equals, and immutable types can only compose other immutable types, everything should just work
  • The obvious implementation of Equals and a “multiply, add, repeat” implementation of GetHashCode are pretty reasonable for many, many types. You could always manually override the methods if necessary.

One downside: Equality doesn’t really work well when there’s an inheritance hierarchy involved. Read Josh Bloch’s “Effective Java” for a detailed discussion. (Ooh, new edition coming out soon. Should make for great reading.)

Generic variance support

Once again, Eric has posted rather a lot on this, concentrating on two aspects: interfaces and delegates. I believe these are already supported in the CLR (I’m sure interfaces are, and I suspect delegates are too).

Just to be clear, I don’t think anyone should expect unsafe covariance/contravariance. The following code still won’t compile (at least unless things go very differently to how I expect):

// Still won’t work!
IList<string> strings = new List<string>();
IList<object> objects = strings;
objects.Add(new object()); // Oww!

This code, however, would be okay, due to the “readonly” nature of iteration:

// Should be okay – can’t break type safety
IEnumerable<string> strings = new List<string>();
IEnumerable<object> objects = strings;

That’s the interface side. For delegates, this would probably be possible:

// Contravariance of input parameters to delegates
Action<object> objectAction = (x => Console.WriteLine(x));
Action<string> stringAction = objectAction;

// Covariance of returns (and output parameters?) from delegates
Func<string> stringFunc = () => “foo”;
Func<object> objectFunc = stringFunc;

All of this is conceptually good. It’s more for people to understand, of course, but it’s still a useful thing to have available.

Dynamic calling support

This one really surprised me – to the extent that I’ll now need to edit the last chapter in the book so as not to look stupid when it’s published. (It’s fine to be wrong when predicting the future, but making a prediction which has already been proven false before publication is embarrassing.) I expected C# to stay fully static forever. However, it looks like the C# team is at least strongly considering making dynamic calling support available. Read the blog for more details, but note what isn’t included: C# reacting dynamically. In other words, Ayende’s beloved IDynamicObject support isn’t being proposed (at the moment) – although I guess there’s always a possibility that the DLR support will somehow be available through “normal” C# without extra compiler support.

This will disappoint some people of course, but I’m happy enough. I can’t see myself using the new support particularly often, and I’m slightly worried at the extra complexity required in the language spec to explain what it will actually do, but I’m likely to treat it in roughly the same way as unsafe blocks – something to ignore most of the time.



As you can see, there’s not a lot really on the table yet. That in itself is quite interesting, however – as well as the nature of the changes. I believe that the changes from C# 3 to 4 will be much more like those from 1 to 2 than 2 to 3. Think about when VS 2005 was released – C# 3 extensions were already available in CTP form for the VS 2005 beta. Here we are with VS 2008 having RTM’d a while ago, and we only have a few ideas to mull over. Now, I’m sure that there are implementations of all of the above features and almost certainly more – but they’re not putting them out just yet. Hopefully we’ll learn more over the next few months.

The big difference with C# 3 was that there was a grand plan: LINQ. Almost every feature in C# 3 (basically not automatic properties or partial methods) supports LINQ in some way or other. I don’t see a big plan for C# 4. That’s a very good thing, in my view. We need time – quite a lot of time, I suspect – to digest LINQ. Awful as the phrase is, LINQ has the potential to be a paradigm shift in development. Those shouldn’t come along too often. Disparate changes can still be incredibly useful of course, so let’s not lack ambition for C# 4 – but I’m expecting idiomatic C# 4 to still be roughly similar to idiomatic C# 3, which certainly couldn’t be said of C# 3 to C# 2.

Only one more post in the immediate future: my own ideas for C# 4 (or at least those which I’ve come up with independently, even if others have mentioned them too).

C# 4, part 2: Ideas from other community members

There has been a fair amount of speculation online about what should be in C# 4. I’ve taken the list below from a few posts, primarily those by Ayende and Jeremy Miller. I’ve deliberately left out the ideas that Microsoft have mentioned that they’re at least considering – they’ll come in the next post.


I suspect everyone has a different idea of what these mean, but I’ll say what I’d like. I want to be able to implement an interface by proxying all calls (other than those I’ve actually implemented) to a particular member variable, as an aid to favouring composition over inheritance. As an example, here’s a class which implements IList<string> but makes sure that only strings with length 5 or more can be added:


public class LongStringList : IList<string>
    readonly IList<string> m_list = new List<string>();

    public void Add(string item)
        if (item.Length < 5)
            throw new ArgumentException(“Only strings of length >= 5 allowed”);



(Yes, you’d need to implement the setter as well to prevent other values being replaced. This is just sample code to give the rough flavour – and the syntax is pretty much made up as an example too. I’m not hung up on the syntax, I just want the functionality.

You could of course derive from System.ObjectModel.Collection<string> – but this prevents you from deriving from any other class, and fixes the inheritance forever. If you only really want to provide your clients with an IList<string> implementation, it’s nicer not to pin anything down. At a later date you could manually implement more of the interface members instead of proxying them, without changing any of the calling code.


I don’t see the benefit over normal string interning here. That could just be because of a poor description of symbols in Ruby, admittedly… but I suspect any other benefit wouldn’t meet the “it’s got to be really useful in many situations” bar.


I’ve only extensively used one language with hashes built in: Groovy. While I agree it’s nice occasionally, I don’t think it’s worth bending the language out of shape for as we’ve now got collection initializers anyway:


var hash = new Dictionary<string,int>
    { “First”, 1 },
    { “Second”, 2}

Automatic delegation

To be honest I don’t really know what Jeremy means here – although it’s possible that he means what I understand as mixins. Ah the joys of loose terminology.


I only have vague ideas of what metaprogramming is all about, and those are mostly through Ayende’s blog. I can see that it’s almost certainly very powerful, but I’m not sure I want it in C#. I don’t want C# to turn into a massive box with every nifty feature ever considered. It’s possible I could be turned on this one, if someone showed me it working really nicely.


Ick, no. I’ve seen what macros tend to be used for. I’m sure there are nice shiny reasons for them, but certainly in the C/C++ form I’d be heavily against them.

Update: Ian Griffiths mailed me drawing my attention to LISP macros and how different they are to C/C++ macros. The way Ian described it sounds similar to what I understand of the metaprogramming that Ayende wants to do. I can see why it’s a powerful tool… but personally I think I’d rather keep it away from a mainstream language like C#. I’ll be writing another blog post to explain my view on this, because it’s worthy of a much fuller discussion.

Everything virtual by default

Absolutely not! Yes, it would make mocking easier – but then making everything public by default would probably make things easier too. Inheritance is hard to control properly, and should only be done with very careful design. As I wrote in the previous post, I’d prefer classes to be sealed by default, i.e. a step in the opposite direction. Oh, there’s the performance implication too, which is one of the reason’s Java needs a much more complicated multi-pass JIT – to allow even virtual methods to be inlined until they’re actually overridden. The performance part is much, much less important than the “inheritance is powerful but easy to misuse” argument.

Not only should the default not change at this point, but it was the right default to start with.

Instant Data Type

This would basically be a way of using anonymous types at a higher level – returning them with a return type of var, for instance. I don’t support that proposal per se, but I can see a benefit in having “named anonymous types” – classes which have the same behaviour as anonymous types (in terms of immutability, equality, hash codes etc) but in a named manner. Something like this:


public class Person = new {string Name, DateTime DateOfBirth}

Person p = new Person { Name = “Jon”, DateOfBirth = 19.June(1976) };
Person p2 = new Person { Name = “Jon”, DateOfBirth = 19.June(1976) };

Assert.AreEqual(p.GetHashCode(), p2.GetHashCode());
// etc


Again, the syntax isn’t terribly important to me – but the ability to define very simple immutable data objects is nice. It could also improve the readability of some LINQ code as you could make the meaning of the (currently anonymous) tuple clear in the name.

A few anticipated comebacks:

  1. Clash with object initializers: yes, it looks like it’s setting properties rather than passing them in as constructor arguments. That’s unfortunate, and maybe parentheses would be better than braces here. That would require named parameters though. (I’ll come onto those in another post!)
  2. Why not just refactor the anonymous type to a named type? ReSharper lets you do this! Indeed it does – but then you’ve got a complete class to maintain. Given a single line of code, I know the features of the Person class. I can add a new property (breaking existing uses, of course) without having to make sure I get the equality and hash code implementations right manually, etc. I prefer simplicity of language expression over just saving typing by using snippets etc – that’s why I like automatic properties.
  3. It can’t use quite the same implementation as anonymous types. Indeed, anonymous types are quite interesting in terms of the number of types actually generated in the IL, due to sharing of generics. I don’t think it would be a great loss in this case though.
  4. The use still isn’t as brief as with anonymous types, due to needing to specify the name. True, but unavoidable, I think.

MemberInfo (infoof)

I don’t think the C# team have actually stated that this is even potentially on the table, but one of the lovely things about having Eric Lippert as a tech reviewer for the book is I get to hear all kinds of anecdotes about what’s been considered before. Some of them will be on the book’s website in the notes section. In this case, I don’t think it’s a problem to reveal that the C# team have considered this before as an infoof operator (commonly pronounced “in-foof” of course).

I could go for this idea – it would certainly make reflection simpler in a number of cases.

Method Interception and IDynamicObject

I’ve lumped these two together as they’re similiar (in my view) – they’re leading down the road to a dynamic language. I can appreciate the benefits of dynamic languages, but that doesn’t mean I think every language ought to be dynamic. I’d pass on these two.

Static interfaces

I’m not entirely sure what Ayende means on this front, but I know I’ve seen a number of requests for the ability to declare that a type definitely has a given static method. Indeed, I’ve wanted it myself a few times. However, I’m not sure how I’d go about using it. Interfaces by their current nature are used when we’ve got an instance. We already know how to pass references etc around – but not types, other than as either type parameters or Type objects.

Now, having just written it I wonder whether that’s what Ayende means – if a type parameter is constrained to implement a particular interface, any static methods within that interface could be called using the type parameter. I can see the use in a few situations, but I’d need to be convinced that it was common enough to warrant a language change. The bar wouldn’t be too high for me on this one though, as I think we could use very natural syntax without having to make up anything significantly new.

Aspect-Oriented Programming

Ooh, tricky one. I’m definitely undecided on this. I can see benefits, but also drawbacks in terms of how obvious the flow of the code is, etc – all the normal objections.

I think I’d welcome additions to the framework and/or runtime to make AOP support simpler, but then leave it to IoC containers etc to actually implement, rather than embedding AOP directly in the language.

Design by Contract

There are parts of DbC that I’d really like to see in the language, or possibly as a language/framework mixture where the framework describes certain common attributes (invariants, non-null arguments etc) and then each compiler takes note of the same attributes. I would really, really like to get rid of having manually-written trivial argument checking in my code. I don’t think I’d immediately want to go as far as Spec# though, in terms of trying to deduce correctness. I wouldn’t like to say why, beyond unfamiliarity (which I know isn’t a good reason). Again, I could possibly be persuaded.

IDisposable implementation support

Good idea. It’s a pain to implement IDisposable properly – some help would be welcome. It would probably need to be flexible enough to allow the developer to say whether a finalizer was required or not, and possibly some other things – but in principle, I’m in favour.

Constructor inheritance

Aargh, no. Constructors effectively belong to the type rather than instances of the type, so they’re not inherited in the same way. They’re a bit like static members – and I know we can call static members as if they were inherited as normal (e.g. UnicodeEncoding.ASCII), but it’s generally a bad idea to do so in my view.

Also consider the lack of control. System.Object has a parameterless constructor – so should all types do so as well, given that they all inherit (directly or indirectly) from System.Object? What would new FileStream() really mean? I suppose one possibility would be to mark your type as intentionally inheriting constructors – which is all very well until the base class adds a new constructor you don’t want, and you don’t realise it until it’s too late. On this one the complexities and disadvantages outweigh the advantages for me.

“Const correctness”

I haven’t actually seen anyone asking for this specifically for C# 4, but it’s been a general feature request pretty much forever. Again, I can see the benefits but:

  1. I suspect it’s the kind of thing you really need to get right in V1.0 for it to be genuinely useful.
  2. I still haven’t seen an easy way to express “this is an immutable reference to a mutable list of immutable objects of a particular type”. Basically you need to express “constness” for every level down the composition hierarchy, which isn’t simple.



Just to wrap the above up, here are the above features in “yes, maybe, no” categorization (just for my own view, of course):

  • Yes: Mixins, instant data types, IDisposable implementation, design by contract (partial), infoof
  • Maybe: Automatic delegation, metaprogramming, static interfaces
  • No: Symbols, hashes, everything virtual by default, macros, constructor inheritance, AOP, method interception and IDynamicObject

Next time (which may be tonight if I’m feeling energetic) I’ll look at what Microsoft has hinted at.

A simple extension method, but a beautiful one

This came up a little while in a newsgroup question, and Marc Gravell and I worked out a solution between us. I’ve finally included it in MiscUtil (although not released it yet – there’s a lot of stuff ready to go when we’ve finalised namespaces and updated the website etc) but I thought I’d share it here.

How often have you written code to do something like counting word frequencies, or grouping items into lists? I know a lot of this can be solved with LINQ if you’re using .NET 3.5, but in .NET 2.0 we’ve always been nearly there. Dictionaries have provided a lot of the necessary facilities, but there’s always the bit of code which needs to check whether or not we’ve already seen the key, and populate the dictionary with a suitable initial value if not – a count of 0, or an empty list for example.

There’s something that 0 and “empty list” have in common. They’re both the results of calling new TValue() for their respect TValue types of int and List<Whatever>. Can you see what’s coming? A generic extension method for dictionaries whose values are of a type which can use a parameterless constructor, which returns the value associated with a key if there is one, or a new value (which is also inserted into the dictionary) otherwise. It’s really simple, but it’ll avoid duplication all over the place:

Note: This code has been updated due to comments below. Comments saying “Use TryGetValue” referred to the old version!


public static TValue GetOrCreate<TKey, TValue>(this IDictionary<TKey, TValue> dictionary,
                                               TKey key)
    where TValue : new()
    TValue ret;
    if (!dictionary.TryGetValue(key, out ret))
        ret = new TValue();
        dictionary[key] = ret;
    return ret;

The usage of it might look something like this:


var dict = new Dictionary<string,int>();

foreach (string word in someText)
    dict[word] = dict.GetOrCreate(word)+1;

I’m not going to claim this will set the world on fire, but I know I’m fed up with writing the kind of code which is in GetOrCreate, and maybe you are too.

Additional overloads are available to specify either a value to use when the key is missing, or a delegate to invoke to create a value.