Macros, and languages within languages

Ian Griffiths mailed me about macros, and explained how LISP macros were very different to C/C++ macros, working at a language level instead of at a text level. I won’t pretend to understand all about what would be possible and what wouldn’t, but Ian gave a good example: query expressions in C# 3. Instead of being part of the language itself, they could apparently have been written as macros, if C# supported them. Then if you wanted to have similar support for different forms of expression, you could just write your own macro library.

Assuming that’s what people are actually requesting, I can certainly see the attraction – but I’d still prefer it if C# didn’t go down that route. I’ll go back to C++ for the guts of the reason why, but it’s not really about macros at this point. It’s about building your own language. Once, someone told me that C++ wasn’t a language – it was a meta-language; no-one used “bare” C++, they worked out their own language made from the building blocks of normal C++, and then used that.

That may or may not be true – or more likely, it’s true in some places but not others – but it scares me as an idea. I’m not going to claim I know every nuance of C#, but it’s pretty rare that you’d throw a line at me without it being reasonably clear what’s going to happen and why, at the language level. Extension methods might mean a bit more information is required as to where a particular method comes from, but it doesn’t take a lot of digging to see what’s going on.

Now imagine that C# 3 didn’t include query expressions, but that someone had come up with them as a macro library. It’s not an insignificant amount of effort to learn what’s going on there, and how it all maps to normal method calls, potentially with expression trees as arguments instead of delegates. Until you understand what’s going on at a reasonably deep level, you can’t really make any firm decisions as to what code including a query expression will do. (Heck, that’s one of the premises of the book: you should really know this stuff, or at least be aware of it.)

That’s fine when there’s a single macro library used globally, but now imagine every company has their own – or worse still, has a bunch of them grabbed from Code Project, possibly including a load of bugs. Most of us aren’t accomplished language designers, and I suspect there’d be an awful lot of macro libraries out there which weren’t quite thought through enough – but were still useful enough to be attractive. They’d become magnets for code warts.

It’s hard enough when you change company to work out what 3rd party libraries are in use, how they’re being used, what the coding conventions are etc. It would be much worse if I had to learn another flavour of C# itself each time. I’m already worried that developers are picking up C# 3 without having a firm enough grasp of C# 2 – and that’s when there’s just a progression within a single language.

I know this all sounds patronising and/or elitest and/or “nanny state” applied to programming languages – but it’s how I feel nonetheless. I just don’t think we (as a development community) are mature enough to handle that sort of power without it turning into a blood bath. This sort of thing sounds fabulous for hobby and research development, and would probably be great in the hands of the best few companies in the world – but I don’t think it’s a good idea for the mainstream.

Okay – time to hear why I’m wrong :)

C# 4, part 3: Ideas from Microsoft

Microsoft haven’t committed to anything in C# 4 yet. However, there have been hints about what they’ve been considering in Eric Lippert’s blog, and more than hints in Charlie Calvert’s blog. There’s not a lot to go on yet, but:

Immutability support

Most of Eric’s posts about immutability have so far been about immutable data structures. However, the first post in the series did mention that they’re playing around with immutability from the point of view of potential language support. Joe Duffy also wrote about immutability at roughly the same time.

What can we expect in terms of support? Possibilities:

  • Compiler checking via attributes, as per Joe’s posts
  • Immutable collections to make it easier to sensibly embed collections within immutable types
  • Readonly automatic properties (as I mentioned in part 1 – I’ll expand on this in the next post)

I’m not sure what could be usefully added beyond those. One possibility, however: what about automatic equality and hashcode generation? I touched on this last time when talking about “instant data types” but I don’t see why it shouldn’t be applicable in general. After all:

  • Immutable types are good candidates for dictionary keys – or to put it the other way round, using mutable types as dictionary keys is a risky idea
  • If all immutable types automatically override GetHashCode and Equals, and immutable types can only compose other immutable types, everything should just work
  • The obvious implementation of Equals and a “multiply, add, repeat” implementation of GetHashCode are pretty reasonable for many, many types. You could always manually override the methods if necessary.

One downside: Equality doesn’t really work well when there’s an inheritance hierarchy involved. Read Josh Bloch’s “Effective Java” for a detailed discussion. (Ooh, new edition coming out soon. Should make for great reading.)

Generic variance support

Once again, Eric has posted rather a lot on this, concentrating on two aspects: interfaces and delegates. I believe these are already supported in the CLR (I’m sure interfaces are, and I suspect delegates are too).

Just to be clear, I don’t think anyone should expect unsafe covariance/contravariance. The following code still won’t compile (at least unless things go very differently to how I expect):

// Still won’t work!
IList<string> strings = new List<string>();
IList<object> objects = strings;
objects.Add(new object()); // Oww!

This code, however, would be okay, due to the “readonly” nature of iteration:

// Should be okay – can’t break type safety
IEnumerable<string> strings = new List<string>();
IEnumerable<object> objects = strings;

That’s the interface side. For delegates, this would probably be possible:

// Contravariance of input parameters to delegates
Action<object> objectAction = (x => Console.WriteLine(x));
Action<string> stringAction = objectAction;

// Covariance of returns (and output parameters?) from delegates
Func<string> stringFunc = () => “foo”;
Func<object> objectFunc = stringFunc;

All of this is conceptually good. It’s more for people to understand, of course, but it’s still a useful thing to have available.

Dynamic calling support

This one really surprised me – to the extent that I’ll now need to edit the last chapter in the book so as not to look stupid when it’s published. (It’s fine to be wrong when predicting the future, but making a prediction which has already been proven false before publication is embarrassing.) I expected C# to stay fully static forever. However, it looks like the C# team is at least strongly considering making dynamic calling support available. Read the blog for more details, but note what isn’t included: C# reacting dynamically. In other words, Ayende’s beloved IDynamicObject support isn’t being proposed (at the moment) – although I guess there’s always a possibility that the DLR support will somehow be available through “normal” C# without extra compiler support.

This will disappoint some people of course, but I’m happy enough. I can’t see myself using the new support particularly often, and I’m slightly worried at the extra complexity required in the language spec to explain what it will actually do, but I’m likely to treat it in roughly the same way as unsafe blocks – something to ignore most of the time.

 

Conclusion

As you can see, there’s not a lot really on the table yet. That in itself is quite interesting, however – as well as the nature of the changes. I believe that the changes from C# 3 to 4 will be much more like those from 1 to 2 than 2 to 3. Think about when VS 2005 was released – C# 3 extensions were already available in CTP form for the VS 2005 beta. Here we are with VS 2008 having RTM’d a while ago, and we only have a few ideas to mull over. Now, I’m sure that there are implementations of all of the above features and almost certainly more – but they’re not putting them out just yet. Hopefully we’ll learn more over the next few months.

The big difference with C# 3 was that there was a grand plan: LINQ. Almost every feature in C# 3 (basically not automatic properties or partial methods) supports LINQ in some way or other. I don’t see a big plan for C# 4. That’s a very good thing, in my view. We need time – quite a lot of time, I suspect – to digest LINQ. Awful as the phrase is, LINQ has the potential to be a paradigm shift in development. Those shouldn’t come along too often. Disparate changes can still be incredibly useful of course, so let’s not lack ambition for C# 4 – but I’m expecting idiomatic C# 4 to still be roughly similar to idiomatic C# 3, which certainly couldn’t be said of C# 3 to C# 2.

Only one more post in the immediate future: my own ideas for C# 4 (or at least those which I’ve come up with independently, even if others have mentioned them too).

C# 4, part 2: Ideas from other community members

There has been a fair amount of speculation online about what should be in C# 4. I’ve taken the list below from a few posts, primarily those by Ayende and Jeremy Miller. I’ve deliberately left out the ideas that Microsoft have mentioned that they’re at least considering – they’ll come in the next post.

Mixins

I suspect everyone has a different idea of what these mean, but I’ll say what I’d like. I want to be able to implement an interface by proxying all calls (other than those I’ve actually implemented) to a particular member variable, as an aid to favouring composition over inheritance. As an example, here’s a class which implements IList<string> but makes sure that only strings with length 5 or more can be added:

 

public class LongStringList : IList<string>
{
    [MixinImplementation(typeof(IList<string>))]
    readonly IList<string> m_list = new List<string>();

    public void Add(string item)
    {
        if (item.Length < 5)
        {
            throw new ArgumentException(“Only strings of length >= 5 allowed”);
        }

        m_list.Add(item);
    }
}

 

(Yes, you’d need to implement the setter as well to prevent other values being replaced. This is just sample code to give the rough flavour – and the syntax is pretty much made up as an example too. I’m not hung up on the syntax, I just want the functionality.

You could of course derive from System.ObjectModel.Collection<string> – but this prevents you from deriving from any other class, and fixes the inheritance forever. If you only really want to provide your clients with an IList<string> implementation, it’s nicer not to pin anything down. At a later date you could manually implement more of the interface members instead of proxying them, without changing any of the calling code.

Symbols

I don’t see the benefit over normal string interning here. That could just be because of a poor description of symbols in Ruby, admittedly… but I suspect any other benefit wouldn’t meet the “it’s got to be really useful in many situations” bar.

Hashes

I’ve only extensively used one language with hashes built in: Groovy. While I agree it’s nice occasionally, I don’t think it’s worth bending the language out of shape for as we’ve now got collection initializers anyway:

 

var hash = new Dictionary<string,int>
{
    { “First”, 1 },
    { “Second”, 2}
};

Automatic delegation

To be honest I don’t really know what Jeremy means here – although it’s possible that he means what I understand as mixins. Ah the joys of loose terminology.

Metaprogramming

I only have vague ideas of what metaprogramming is all about, and those are mostly through Ayende’s blog. I can see that it’s almost certainly very powerful, but I’m not sure I want it in C#. I don’t want C# to turn into a massive box with every nifty feature ever considered. It’s possible I could be turned on this one, if someone showed me it working really nicely.

Macros

Ick, no. I’ve seen what macros tend to be used for. I’m sure there are nice shiny reasons for them, but certainly in the C/C++ form I’d be heavily against them.

Update: Ian Griffiths mailed me drawing my attention to LISP macros and how different they are to C/C++ macros. The way Ian described it sounds similar to what I understand of the metaprogramming that Ayende wants to do. I can see why it’s a powerful tool… but personally I think I’d rather keep it away from a mainstream language like C#. I’ll be writing another blog post to explain my view on this, because it’s worthy of a much fuller discussion.

Everything virtual by default

Absolutely not! Yes, it would make mocking easier – but then making everything public by default would probably make things easier too. Inheritance is hard to control properly, and should only be done with very careful design. As I wrote in the previous post, I’d prefer classes to be sealed by default, i.e. a step in the opposite direction. Oh, there’s the performance implication too, which is one of the reason’s Java needs a much more complicated multi-pass JIT – to allow even virtual methods to be inlined until they’re actually overridden. The performance part is much, much less important than the “inheritance is powerful but easy to misuse” argument.

Not only should the default not change at this point, but it was the right default to start with.

Instant Data Type

This would basically be a way of using anonymous types at a higher level – returning them with a return type of var, for instance. I don’t support that proposal per se, but I can see a benefit in having “named anonymous types” – classes which have the same behaviour as anonymous types (in terms of immutability, equality, hash codes etc) but in a named manner. Something like this:

 

public class Person = new {string Name, DateTime DateOfBirth}

Person p = new Person { Name = “Jon”, DateOfBirth = 19.June(1976) };
Person p2 = new Person { Name = “Jon”, DateOfBirth = 19.June(1976) };

Assert.IsTrue(p.Equals(p2));
Assert.AreEqual(p.GetHashCode(), p2.GetHashCode());
// etc

 

Again, the syntax isn’t terribly important to me – but the ability to define very simple immutable data objects is nice. It could also improve the readability of some LINQ code as you could make the meaning of the (currently anonymous) tuple clear in the name.

A few anticipated comebacks:

  1. Clash with object initializers: yes, it looks like it’s setting properties rather than passing them in as constructor arguments. That’s unfortunate, and maybe parentheses would be better than braces here. That would require named parameters though. (I’ll come onto those in another post!)
  2. Why not just refactor the anonymous type to a named type? ReSharper lets you do this! Indeed it does – but then you’ve got a complete class to maintain. Given a single line of code, I know the features of the Person class. I can add a new property (breaking existing uses, of course) without having to make sure I get the equality and hash code implementations right manually, etc. I prefer simplicity of language expression over just saving typing by using snippets etc – that’s why I like automatic properties.
  3. It can’t use quite the same implementation as anonymous types. Indeed, anonymous types are quite interesting in terms of the number of types actually generated in the IL, due to sharing of generics. I don’t think it would be a great loss in this case though.
  4. The use still isn’t as brief as with anonymous types, due to needing to specify the name. True, but unavoidable, I think.

MemberInfo (infoof)

I don’t think the C# team have actually stated that this is even potentially on the table, but one of the lovely things about having Eric Lippert as a tech reviewer for the book is I get to hear all kinds of anecdotes about what’s been considered before. Some of them will be on the book’s website in the notes section. In this case, I don’t think it’s a problem to reveal that the C# team have considered this before as an infoof operator (commonly pronounced “in-foof” of course).

I could go for this idea – it would certainly make reflection simpler in a number of cases.

Method Interception and IDynamicObject

I’ve lumped these two together as they’re similiar (in my view) – they’re leading down the road to a dynamic language. I can appreciate the benefits of dynamic languages, but that doesn’t mean I think every language ought to be dynamic. I’d pass on these two.

Static interfaces

I’m not entirely sure what Ayende means on this front, but I know I’ve seen a number of requests for the ability to declare that a type definitely has a given static method. Indeed, I’ve wanted it myself a few times. However, I’m not sure how I’d go about using it. Interfaces by their current nature are used when we’ve got an instance. We already know how to pass references etc around – but not types, other than as either type parameters or Type objects.

Now, having just written it I wonder whether that’s what Ayende means – if a type parameter is constrained to implement a particular interface, any static methods within that interface could be called using the type parameter. I can see the use in a few situations, but I’d need to be convinced that it was common enough to warrant a language change. The bar wouldn’t be too high for me on this one though, as I think we could use very natural syntax without having to make up anything significantly new.

Aspect-Oriented Programming

Ooh, tricky one. I’m definitely undecided on this. I can see benefits, but also drawbacks in terms of how obvious the flow of the code is, etc – all the normal objections.

I think I’d welcome additions to the framework and/or runtime to make AOP support simpler, but then leave it to IoC containers etc to actually implement, rather than embedding AOP directly in the language.

Design by Contract

There are parts of DbC that I’d really like to see in the language, or possibly as a language/framework mixture where the framework describes certain common attributes (invariants, non-null arguments etc) and then each compiler takes note of the same attributes. I would really, really like to get rid of having manually-written trivial argument checking in my code. I don’t think I’d immediately want to go as far as Spec# though, in terms of trying to deduce correctness. I wouldn’t like to say why, beyond unfamiliarity (which I know isn’t a good reason). Again, I could possibly be persuaded.

IDisposable implementation support

Good idea. It’s a pain to implement IDisposable properly – some help would be welcome. It would probably need to be flexible enough to allow the developer to say whether a finalizer was required or not, and possibly some other things – but in principle, I’m in favour.

Constructor inheritance

Aargh, no. Constructors effectively belong to the type rather than instances of the type, so they’re not inherited in the same way. They’re a bit like static members – and I know we can call static members as if they were inherited as normal (e.g. UnicodeEncoding.ASCII), but it’s generally a bad idea to do so in my view.

Also consider the lack of control. System.Object has a parameterless constructor – so should all types do so as well, given that they all inherit (directly or indirectly) from System.Object? What would new FileStream() really mean? I suppose one possibility would be to mark your type as intentionally inheriting constructors – which is all very well until the base class adds a new constructor you don’t want, and you don’t realise it until it’s too late. On this one the complexities and disadvantages outweigh the advantages for me.

“Const correctness”

I haven’t actually seen anyone asking for this specifically for C# 4, but it’s been a general feature request pretty much forever. Again, I can see the benefits but:

  1. I suspect it’s the kind of thing you really need to get right in V1.0 for it to be genuinely useful.
  2. I still haven’t seen an easy way to express “this is an immutable reference to a mutable list of immutable objects of a particular type”. Basically you need to express “constness” for every level down the composition hierarchy, which isn’t simple.

 

Conclusion

Just to wrap the above up, here are the above features in “yes, maybe, no” categorization (just for my own view, of course):

  • Yes: Mixins, instant data types, IDisposable implementation, design by contract (partial), infoof
  • Maybe: Automatic delegation, metaprogramming, static interfaces
  • No: Symbols, hashes, everything virtual by default, macros, constructor inheritance, AOP, method interception and IDynamicObject

Next time (which may be tonight if I’m feeling energetic) I’ll look at what Microsoft has hinted at.

A simple extension method, but a beautiful one

This came up a little while in a newsgroup question, and Marc Gravell and I worked out a solution between us. I’ve finally included it in MiscUtil (although not released it yet – there’s a lot of stuff ready to go when we’ve finalised namespaces and updated the website etc) but I thought I’d share it here.

How often have you written code to do something like counting word frequencies, or grouping items into lists? I know a lot of this can be solved with LINQ if you’re using .NET 3.5, but in .NET 2.0 we’ve always been nearly there. Dictionaries have provided a lot of the necessary facilities, but there’s always the bit of code which needs to check whether or not we’ve already seen the key, and populate the dictionary with a suitable initial value if not – a count of 0, or an empty list for example.

There’s something that 0 and “empty list” have in common. They’re both the results of calling new TValue() for their respect TValue types of int and List<Whatever>. Can you see what’s coming? A generic extension method for dictionaries whose values are of a type which can use a parameterless constructor, which returns the value associated with a key if there is one, or a new value (which is also inserted into the dictionary) otherwise. It’s really simple, but it’ll avoid duplication all over the place:

Note: This code has been updated due to comments below. Comments saying “Use TryGetValue” referred to the old version!

 

public static TValue GetOrCreate<TKey, TValue>(this IDictionary<TKey, TValue> dictionary,
                                               TKey key)
    where TValue : new()
{
    TValue ret;
    if (!dictionary.TryGetValue(key, out ret))
    {
        ret = new TValue();
        dictionary[key] = ret;
    }
    return ret;
}

The usage of it might look something like this:

 

var dict = new Dictionary<string,int>();

foreach (string word in someText)
{
    dict[word] = dict.GetOrCreate(word)+1;
}

I’m not going to claim this will set the world on fire, but I know I’m fed up with writing the kind of code which is in GetOrCreate, and maybe you are too.

Additional overloads are available to specify either a value to use when the key is missing, or a delegate to invoke to create a value.

C# 4, part 1: Looking back at the past

Everyone else is speculating about what’s going to be in C# 4 (and various possibilities are coming out of MS), so I thought it would be wise to start my own series of wishlist posts before I miss the boat completely.

In this first post, I’m not going to look at the future at all – I’m going to look at mistakes of the past. When I say “mistake” I of course mean “things I would have done differently had I been a language designer with 20/20 hindsight”. Of course, there’s a lot of room for argument :)

Mistakes in C# 1

  • Lack of separate getter/setter access for properties. This came in C# 2, but it should have been obvious that it was highly desirable long before C# 1 came out.
  • Lack of generics – ish. Don’t worry, I’m not going to claim that all the features of C# 2 and 3 should have been in C# 1, but if generics had been in at the start we could have avoided having the non-generic collections (and interfaces) completely. Mind you, I’m glad that the .NET team took their time instead of including the bodged (IMO) generics of Java 5.
  • Classes not being sealed by default. I’ve believed for a long time that allowing inheritance incurs a design cost (and it’s not like I’m unique in that respect). C# fixed a mistake of Java by making methods non-virtual by default; the same should be true for classes in my view.
  • Enums just being named numbers. Again, I’ve blogged about this before, but it’s worth mentioning again. It’s possible to work around the lack of this feature (as the blog post readers pointed out!) but framework and language support would have been very welcome.
  • The “x” character escape sequence. Fortunately it’s rarely used, but it’s so error-prone. Quick, how different are “x8Good compiler”, “x8Bad compiler”? What’s the first character in each string? (This will appear soon on my brainteasers page).
  • The switch statement. There are lots of ways in which this could have been better designed. VB addresses some of them (such as making it easier to express multiple matching values) but there are other ways in which this construct needed overhauling. Fallthrough is (rightly) prohibited, so why not just force braces round the code in the case block instead of requiring a break statement? Aside from anything else, that would fix the somewhat bizarre scoping rules.
  • Wacky overload resolution. I entirely understand the point that introducing new methods in a base type shouldn’t change the behaviour in derived types – but if you’ve explicitly chosen to override that method, that should be more easily callable than it is. (See the first example of the brainteasers page to see what I’m talking about.)
  • The “lock” keyword, and associated issues. Basically, the IDisposable pattern should have been used for locking, and not every object should have a monitor associated with it. Developers should keep a close eye on what’s being locked, and being able to lock on everything takes away from this. Likewise “lock” creates a keyword for little purpose (and one which would otherwise be useful as a variable name etc).

Mistakes in C# 2

  • Lack of partial methods. I’m really only saying this because it broke the format of C# in Depth slightly. I’ve introduced partial methods along with partial types because they logically fit in with them, and they don’t fit in with any of the other features of C# 3 particularly. This is just a matter of not working out all of how partial types would be used – or at least not doing so early enough. (For all I know partial methods were on the table before C# 2 shipped – I wouldn’t be surprised.)
  • Possibly the lack of generic variance. This is certainly a big issue of understanding which is often raised as a question. On the other hand, I suspect that if/when it becomes available, it will raise just as many questions in terms of the detail anyway…
  • The System.Nullable class. It’s really only there as an historical accident, and I know it’s not a C# issue as such – but even so… (Note for extra clarity – I’m fine with nullable types and the System.Nullable<T> struct. It’s just the supporting class that I don’t like.)
  • InternalsVisibleTo requiring the whole public key for strongly signed assemblies instead of the public key token (contrary to the documentation). Ick.

Mistakes in C# 3

  • It’s a real shame that readonly automatic properties aren’t in C# 3. I suspect they’ll come in C# 4 (and they’ll be on my wishlist in future posts) but I think it’s reasonable to wonder why they weren’t included in C# 3. Immutability is a known pattern of goodness, and although C# 4 may well contain any number of more significant improvements towards making it easier, readonly automatic properties would have been a good start.
  • The way that extension methods are found. This issue was raised time and time again before release, and I’ve never heard a good defence of finding them by whole namespace, instead of allowing developers to say “use the extension methods found in this class, please”. As it is, anyone writing their own extension methods is likely to end up with whole namespaces devoted to a single type. It’s very odd.

Of course that’s not to say there aren’t other things I’d like to see – but these are more “features which were slightly misdesigned” rather than “features which I really want”.

I’m not trying to take anything away from the language designers – C# is still easily my favourite language in terms of its design, particularly in C# 3, but nobody’s perfect :)

Next time I’ll start giving my opinions of features that other people are calling for.