All posts by jonskeet

Mad props to @arcaderage for the "Princess Rescue" image - see https://toggl.com/programming-princess for the full original

C# 4: Immutable type initialization

(I’m giving up with the numbering now, unless anyone particularly wants me to keep it up. What was originally going to be a limited series appears to be growing without end…)

As Chris Nahr pointed out in my previous post, my earlier idea about staged initialization was very half-baked. As he’s prompted me to think further about it, I’ve come up with another idea. It’s slightly more baked, although there are lots of different possibilities and open questions.

Let’s take a step back, and look at my motivation: I like immutable types. They’re handy when it comes to thread safety, and they make it a lot easier to reason about the world when you know that nothing can change a certain value after it’s been created. Now, the issues are:

  • We really want to be able to fully construct the object in the constructor. That means we can mark all fields as initonly in the generated IL, potentially giving the CLR more scope for optimisation.
  • When setting more than two or three values (while allowing some to be optional) constructor overloading ends up being a pain.
  • Object initializers in C# 3 only apply to properties and fields, not method/constructor arguments – so we can’t get the clarity of naming.
  • Ideally we want to support validation (or possibly other code) and automatic properties.
  • The CLR won’t allow initonly fields being set anywhere other than in the constructor – so even if we made sure we didn’t call any setters other than in the constructor, we still couldn’t use them to set the fields.
  • We want to allow simple construction of immutable types from code other than C#. In particular, I care about being able to use projects like Spring.NET and Castle/Windsor (potentially after changes to those projects) to easily create instances of immutable types without resorting to looking up the order of constructor parameters.

The core of the proposal is to be able to mark properties as initonly, and get the compiler to create an extra type which is thoroughly mutable, and contains those properties – as well as a constructor which accepts an instance of the extra type and uses it to populate the immutable instance of the main type before returning.

Extra syntax could then be used to call this constructor – or indeed, given that the properties are actually readonly, thus avoiding any ambiguity, normal object initializers could be used to create instances.

Just as an example, imagine this code:

public class Address
{
    public string Line1 { get; initonly set; }
    public string Line2 { get; initonly set; }
    public string Line3 { get; initonly set; }
    public string County { get; initonly set; }
    public string State { get; initonly set; }
    public string Country { get; initonly set; }
    public string ZipCode { get; initonly set; }
   
    // Business methods as normal
}

// In another class
Address addr = new Address
{
    Line1=“10 Fairview Avenue”,
    Line3=“Makebelieve Town”,
    County=“Mono County”,
    State=“California”,
    Country=“US”
};

That could be transformed into code a bit like this:

// Immutable class

// Let tools (e.g. the compiler!) know how we
// expect to be initialized. Could be specified
// manually to avoid using the default class name
[InitializedWith(typeof(Address.Init))]
public class Address
{
    // Nested mutable class used for initialization
    [CompilerGenerated]
    public class Init
    {
        public string Line1 { get; set; }
        public string Line2 { get; set; }
        public string Line3 { get; set; }
        public string County { get; set; }
        public string State { get; set; }
        public string Country { get; set; }
        public string ZipCode { get; initonly set; }
    }

    // Read-only “real” properties, automatically
    // implemented and backed with initonly fields
    public string Line1 { get; }
    public string Line2 { get; }
    public string Line3 { get; }
    public string County { get; }
    public string State { get; }
    public string Country { get; }
    public string ZipCode { get; }
   
    // Automatically generated constructor, using
    // backing fields directly
    public Address(Address.Init init)
    {
        <>_line1 = init.Line1;
        <>_line2 = init.Line2;
        <>_line3 = init.Line3;
        <>_county = init.County;
        <>_state = init.State;
        <>_country = init.Country;
        <>_zipCode = init.ZipCode;
    }

    // Business methods as normal
}

// In another class
Address addr = new Address(new Address.Init
{
    Line1=“10 Fairview Avenue”,
    Line3=“Makebelieve Town”,
    County=“Mono County”,
    State=“California”,
    Country=“US”
});

That’s the simple case, of course. Issues:

  • Unlike other compiler-generated types (anonymous types, types for iterator blocks, types for anonymous functions) we do want this to be public, and have a name which can be used elsewhere. We need to find some way of making sure it doesn’t clash with other names. In the example above, I’ve used an attribute to indicate which type is used for initialization – I could imagine some way of doing this in the “pre-transform” code to say what the auto-generated type should be called.
  • What happens if you put code in the setter, instead of making it automatically implemented? I suspect that code should be moved into the setter of the initialization class – but at that point it won’t have access to the rest of the state of the class (beyond the other properties in the initialization class). It’s somewhat messy.
  • What if you want to add code to the generated constructor? (Possibly solution: allow constructors to be marked somehow in a way that means “add on the initialization class as a parameter at the end, and copy all the values as a first step.)
  • How can you indicate that some parameters are mandatory, and some are optional? (The mandatory parameters could just be marked as readonly properties rather than initonly, and then the initialization class specified as an extra parameter for a constructor which takes all the mandatory ones. Doesn’t feel elegant though, and leaves you with two different types of initialization code being mixed in the client – some named, some positional.)
  • How do you specify default values? (They probably end up being the default values of the automatically generated properties of the initialization class, but there needs to be some syntax to specify them.)

I suspect there are more issues too – but I think the benefits would be great. I know the C# team has been thinking about immutability, but I’ve no idea what kind of support they’re currently envisioning. Unlike my previous ideas, which were indeed unpalatable for various reasons, I think this one has real potential. Mind you, given that I’ve come up with it after only mulling this over in “spare” time, I highly doubt that it will be a new concept to the team…

Reviewing other C# books?

Just a quick question, really – I’d really like feedback to this one.

This morning I was reading Charlie Calvert’s blog, and saw a link to the preview of a C# 3 book by Bruce Eckel and Jamie King. I’ve downloaded it, and had a look – naturally interested in the competition (and with plenty of evidence that I’ve already finished my book and won’t be plagiarising!). At the same time, I’m also interested in some other C# books which are coming out or are already out – particularly Head First C#.

My question is – would my views on other C# books be interesting to you, dear readers? Obviously I’d have a somewhat different perspective on the matter to other people – but at the same time I think it’s clear that I’ll be somewhat biased. Any such reviews are bound to contain comparisons to my own way of approaching writing about C#. That could either be interesting, or it could be really annoying.

Thoughts welcome. Oh, and if by any chance any of the authors of other C# books are reading this post (unlikely, but hey…) – I’d love to hear your views on my book, whatever they happen to be.

Reading Geek Night 2 – March 28th

I’ve mentioned “Reading Geek Night” before – it’s basically a loosely connected bunch of folks talking about fun stuff. There’s a mix of agile/not, .NET/not, expert/not, egomaniac/not, sane/not etc.

The next meeting will be on March 28th, at my house (unless this post draws vast numbers of new folks!). We’ll start off by talking about C#, with me basically doing a demo of chapter 1 of my book – evolving code from C# 1 to C# 3 via C# 2. From there I expect it’s likely we’ll look at LINQ in more depth, maybe iterators, maybe think up funky new stuff. Basically a good time should be had by all.

If you’re around the Reading (UK) area and you’re interested in coming, please let me know. Anyone who has already mailed me about this but has managed to avoid being on the mailing list somehow, let me know that, too!

C# 4, part 5: Other bits and bobs which probably don’t merit inclusion

Okay, I know I said that part 4 would be the last part in this series… but since then I’ve not only thought about iterator block parameter checking, but a few other things. Some of these I simply forgot about before, and some I hadn’t thought of yet. I’m not sure any of these are actually worthy of inclusion, but they may provoke further thought.

Tuple returns

I’ve been reading Programming Erlang and I suspect that being able to return tuples (i.e. multiple values, strongly typed but without an overall predefined type) would be a good thing. For instance, in a tuple-returning world, int.TryParse could be redesigned to return both the true/false and the parsed value. It could have a signature like this:

public static (int, bool) TryParse(string text)

… and then be called like this:

int value;
bool parsed;

(value, parsed) = int.TryParse(“Foo”);

Now, a few things to work out:

How do we ignore values we’re not interested in?

Part of the problem with out parameters is that sometimes you don’t actually care about the value – but you still have to declare and pass in a parameter. Suppose we could use ? as a placeholder for “I don’t care”. (This is _ in Erlang pattern matching, IIRC. Same kind of business.)

What could you do with a tuple?

We could potentially make tuples first class citizens, so that you could declare variables of that type, a bit like anonymous types, but with anonymous property names as well, used just for matching later. Or we could force matching at the point of method call, which would restrict the use a bit further but leave less other rules to be worked out.

Either way, I’d hope to be able to set either fields or properties by parameter matching.

What’s the value of the overall expression?

This really depends on the answer to the previous question. If tuples are first class types, then the result of the expression would normally be the tuple itself. However, I wonder whether there’s more that can be done. For instance, thinking about our TryParse example, it’s useful to be able to write (currently):

if (int.TryParse(“Foo”out value))
{
   …
}

Suppose we were able to designate one of the matched elements of the tuple to be the expression result, e.g. using _ to be slightly Perl-like:

if ((value, _) = int.TryParse(“Foo”))
{
   …
}

Would that be worth doing?

More information required…

I suspect that people who know more about the use of tuples in other languages would be able to say more about this. Some overlap with anonymous types is clearly relevant too, and would need to be carefully considered. I’m not wedded to any of the syntax shown above, of course – I’m just interested in how/where it could be useful.

Named method/constructor arguments

One of the features I like about F# is that you can specify the names of arguments, without worrying about the order. This means that it becomes even more important to name methods appropriately, but it would make method calls with many parameters simpler to read. Currently it’s common practice to use one parameter per line and a comment to indicate the use, e.g.

foo.Complicated(10,        // Number of elements to return
                “bar”,     // Name of collection
                x => x+1,  // Step for element
                3.5        // Load factor
               );

In fact, this example is relatively simple because all the parameter types are different – look at the more complicated overloads of Enumerable.GroupBy for rather more hellish examples. It’s incredibly ugly, and the compiler isn’t able to check anything. Now suppose we could instead write:

foo.Complicated(maxElements = 10,
                collectionName = “bar”,
                step = x => x+1,
                load = 3.5);

Personally I think that’s clearer and less error-prone. The arguments could be reordered with few issues, and the compiler could check that we really were using the right parameter names. One potential issue is in terms of side-effects, where evaluating one argument had a side-effect which affected the evaluation of another argument. At that point reordering is a breaking change. I suspect the compiler would need to stick to the specified textual order, and then rework things on the stack as required to get the appropriate order for the method call. A bit nasty.

Event handler subscription in object initializers

I only thought of this one today, when coming up with an example for a screencast on object initializers. I suspect most uses of object initializers will be to with custom classes (although I recently used them for XmlWriterSettings to great effect) which would make the screencast harder to understand. I was wondering what common framework classes had lots of writable properties, and I hit on the idea of building a UI. It shouldn’t surprise me that this works quite nicely, but you can build up a hierarchical UI quite pleasantly. For example:

Form form = new Form
{
    Size = new Size(300, 300),           
    Controls =
    {
        new Button
        {
            Location = new Point(10, 10),
            Text = “Hello”,
        },
        new ListBox
        {
            Location = new Point(10, 50),
            Items =
            {
                “First”,
                “Second”,
                “Third”
            }
        }
    }
};
Application.Run(form);

This is somewhat reminiscent of Groovy builders (and no doubt many other things, of course). However, one thing you can’t currently do is attach an event handler in an object initializer. The obvious syntax would be something like:

new Button
{
    Location = new Point(10, 10),
    Text = “Hello”,
    Click += (sender, args) => Save()
}

where I happen to have used a lambda expression, but didn’t need to – a normal method group conversion or any other way of constructing a delegate would have done just as well.

I mailed the C# team about this, and although it’s been considered before it’s really not useful in many situations. However, the syntax has been left open – there’s no other use of += within object initializers, so it could always be revisited if someone comes up with a killer pattern.

Immutable object initialization

I’ve been thinking about this partly as a result of object initialization in general, and the previous point about named arguments. As has been noted before, C# doesn’t really help you to build immutable objects – either as from the point of view of building the type, or then instantiating it. Basically you’ve got the constructor call, and that’s it. A static method could set private properties and then return the object for popsicle immutability, but it still feels slightly grim.

Someone (possibly Marc Gravell – not sure) suggested to me that there ought to be some way of indicating when an object initializer had finished. At the time I think I rejected the idea, but now I like it. There’s already the ISupportInitialize interface, but that feels slightly too heavy to me – in particular, it has two methods rather than just one. What I think could be nice would be:

  • A new interface with a single CompleteInitialization method.
  • Readonly automatic properties which would either make the property only writable during a constructor call if the new interface weren’t implemented or would insert an execution-time check that CompleteInitialization hadn’t been called already.
  • I’d anticipate the C# compiler implementing the new interface itself automatically in some way which supported inheritance reasonably, unless specifically implemented by the developer.
  • Members other than constructors couldn’t set readonly automatic properties on this, to avoid accidents.
  • The CLR should have some interaction so it knew which fields it could treat as being readonly after initialization had been completed.
  • Object initializers would call CompleteInitialization automatically at the end of the block.

It’s a bit messy, and I’m sure I haven’t thought of everything – but I suspect something along these lines would be a good idea at some point. It’s reminiscent of an earlier wacky idea I had which went further, but this would be specifically to support immutability. Without it, complex immutable types end up with nightmarish constructor calls.

Conclusion

So there we have it – some relatively half-baked ideas which will hopefully provoke a bit more thought – both from readers and myself. It’s interesting to note that aside from event subscription, they all have a fair number of questions and complexity around them, which is off-putting to start with. I would feel more comfortable about event subscription being added than any of the others, because it’s relatively simple and independent. The others feel like more dangerous features – even if they’re more useful too.

The value of a language specification

Last Friday evening was the inaugural Reading Geek Night. More about that another time, but if you’re in or around Reading in the UK, and fancy meeting up with some smart people (and me) to discuss software in various shapes and forms, let me know.

After most people had gone home, a few of us including Stuart Caborn were talking about specs. Stuart remembers how I ended up writing some annotations in the C# Annotated Standard: we were debugging some code, and I noticed an unboxing conversion which was unboxing an enum as an int. It worked, but I was surprised. I consulted the spec, and found that according to the spec it really shouldn’t have worked. (Furthermore, the spec suggested a case which couldn’t possibly be valid. I can’t remember the details now, but I can dig them up if anyone was interested.) I’d had one or two conversations with Jon Jagger (the C# ECMA Task Group convenor at the time) before, so I mailed him. Jon invited me to join in the book project, and I took to it with gusto. I reading most of C# 2 ECMA spec over the course of a few weeks, writing annotations as I went along.

This is not what most people would consider normal behaviour. When I recently gave a talk about C# 3, I was delighted to hear someone else mention that they had checked the spec about some aspect of the language. Finally, I wasn’t alone! However, such people are clearly the exception rather than the rule.

I genuinely don’t think that matters too much. I really don’t expect many developers to read the spec – certainly not thoroughly. I think it’s important to know that there is a spec, and be able to consult it when in doubt. I want to be able to know what every line of code is doing, in terms of which variable it’s going to access, which method it’s going to call, the order of execution of a post-increment as a method argument, etc.

That’s not to say I actually learn all of the rules by rote – even for something as simple as operator precedence, I sometimes put brackets in when they’re not required, for example. I’d rather not rely on me or a maintenance engineer having to remember too many details. But if someone else has written some obscure line of code, I’m pretty confident that I’ll be able to understand it with the help of the spec, and refactor it into something more readable.

Now, Stuart challenged the value of the spec. If his code was misbehaving he wouldn’t consult the spec – he’d consult either books or (more likely) the unit tests. Realistically, the vast majority of C# is being compiled by the Microsoft compiler, so the idea of having a spec available for other implementations isn’t actually important to that many developers in terms of business. (It may be psychologically and politically important, and I’m not trying to knock the great work that the Mono project has done – but I’ve never used Mono professionally, and I suspect that’s the case for most people.) Either the code works or it doesn’t, and if it doesn’t work the tests should say so.

I counter that not having a spec is like not having documentation for a library – if you start relying on unspecified behaviour, you can come unstuck when that behaviour changes in a legitimate way. A good example of this is depending on a particular hash algorithm being used for GetHashCode; the algorithm for string.GetHashCode() changed between .NET 1.1 and 2.0, and I’ve seen a few people get burned, having stored the generated hash values in a database. Suddenly nothing matches any more… because they ignored what the documentation said.

Stuart’s response: if the tests still work, the changes haven’t broken anything. If the tests don’t work, we can go and fix the code so they start to work again. I’ll concede that it’s unlikely that implementation changes in the compiler will actually break any code (and it’s also very unlikely that specification changes will break code – the C# design team are pretty fanatical about not introducing breaking changes).

I can see Stuart’s point of view, but it just feels so very wrong. I suspect a lot of that is down to my personality type – how I really hate working without enough information (or what I consider to be enough information). Today I fixed a bug with an ASP.NET application which was producing incorrect JavaScript. It was working on some machines and not working on others. I thought I’d found out why (a different version of a library in the GAC) but that was ruled out after examining another machine which was working contrary to my hypothesis. I’m reasonably confident that my fix will work, but I really don’t like the fact that I don’t understand the issue in the first place. It’s very hard to piece together the necessary information – which is like working on a language that doesn’t have a spec.

Eventually, I came up with an answer which I think Stuart more or less accepted. I’m in one of the groups the spec is aimed at. I write about C#, hoping to explain it to other people. One of my aims with C# in Depth is to give enough information to make the spec even more irrelevant to most developers when it comes to the changes in C# 2 and 3. Without wishing to denigrate existing C# books too much, I’ve often found that the kind of details which I wanted to investigate further just weren’t covered in the books – to get the answer, I had to go to the spec. I really hope that if I’d had my own book, I’d have been able to consult that for most of those issues. However, I simply couldn’t have written the book without the spec.

I’ve had experience of writing about a language without a spec. When I was helping out with Groovy in Action, I often found myself frustrated by the fact that the Groovy spec is far from finished. This shouldn’t be surprising to anyone – Microsoft have a significant team of really smart people who are paid to immerse themselves thoroughly in C# and make sure the language is all it can be, in terms of design, documentation and implementation. Designing a language well is hard – I haven’t been part of designing any languages, but I can get some idea of the difficulty based on what I’ve seen of the languages I’ve used. The loving care required to make sure that all the behaviour that should be pinned down is indeed described, while leaving rigidly defined areas of doubt and uncertainty where that’s appropriate, must be phenomenal. I don’t doubt that the Groovy team is talented, but coming up with a good spec is probably too much of a resource drain, unfortunately.

I haven’t covered everything I feel about specs in this post, but I’m going to finish now before I officially begin to ramble. Apologies to Stuart if I’ve misrepresented his views – and I should point out that this was late at night after Stuart had a few beers, which may be relevant. In short then (and including points I haven’t gone into):

  • The existence of a specification is important, even if it’s not consulted by every developer. Even if I were never ill, I’d be glad that the National Health Service existed.
  • I’d be very worried if the language/compiler team itself didn’t have a good spec, and if they do, there’s no reason to hide it. As an example of how important this is, just read Martin Fowler writing about JRuby/IronRuby: “Soon-to-be-ThoughtWorker Ola Bini, a JRuby committer, reckons that it’s almost impossible to figure out how to implement a Ruby runtime without looking at source code of the MRI.” That screams to me of “the implementation is the documentation” which I regard as a very unhealthy state of affairs.
  • Specifications are vital for authors (whether of books or web articles) who need to present accurate information based on more than just the current behaviour.
  • Sometimes you can trust a specification more than tests – with a memory model spec, it’s possible to reason about whether or not my code is thread-safe. It could pass all tests but still not handle a bizarre race condition. (Of course, a better memory model spec for .NET would be welcome.)
  • Unit tests are never going to catch every flaw. They can give you a great deal of confidence, but not certainty. (Example: how many people explicitly check that their text handling code will work just as well when provided with non-interned strings, rather than strings which were originally specified as literals? If the string interning behaviour changes in a valid way, are you absolutely sure your code won’t fail?)
  • I’m on the fence about the value of having the ECMA spec as well as the Microsoft one. I can see how it could be important in certain business situations – but as a developer, I don’t care that much. I’ve had very few qualms about changing my standard reference from ECMA C# 2 to MS C# 3. It’s unclear to me (as someone completely outside the process) how much influence ECMA has at this stage on the design of the language itself. Were ECMA committee members explicitly consulted during the C# 3 design process? Clearly making a significant change to the language now would be likely to make all existing compilers “broken” – so what can the ECMA team do beyond reframing the existing rules? As I say, I’m an outside in this matter, so I can’t really judge – but I think it’s a valid question to ask.

Anyway, that’s about a sermon’s-worth of preaching about specifications – time for bed.

C# 4 idea: Iterator blocks and parameter checking

Iterator blocks have an interesting property: they defer execution. When the method (or property) is called, none of your code is executed – it only starts running when MoveNext() is first called.

Deferred execution is a great thing in many ways, but it’s a pain when it comes to parameter checking. If you check parameters within an iterator block, you’ve effectively left a timebomb – the error will be potentially reported a long way from the original source of the problem. (Side-note: this is why I prefer using a cast to as when I know what type something really should be – I’d rather get an InvalidCastException immediately than a NullReferenceException later on.)

One solution to this is to check the parameters in the public method and then call a private method which is implemented with an iterator block. That’s a bit ugly though. It would be nice to be able to specify the parameter checking in an initial block which could only occur at the start of the method – something like this simple implementation of Repeat:

public static IEnumerable<TResult> Repeat<TResult>(TResult element, int count)
{
    yield do
    {
        if (count < 0)
        {
            throw new ArgumentException(count);
        }
    }
    for (int i=0; i < count; i++)
    {
        yield return element;
    }
}

I’ve chosen yield do for two reasons: it fits in with the rest of the yield pattern, and by using an existing keyword I suspect it reduces/eliminates the chance of this being a breaking change. It’s not ideal from the point of self-description, but do is the closest keyword I could find when I looked at the spec. In a language with begin as a keyword, that would probably be a better choice – or initial. fixed appeals in that the code is fixed in the method rather than being shunted off into the compiler-generated type, but the normal use of fixed is far too different to make this an appealing choice. Any other suggestions are very welcome :)

Is this an important enough change to make it worth including in C# 4? I’m not sure. The negative side is that I suspect relatively few people use iterator blocks much, so it may not have a very wide benefit. The positive side is that it only actually makes any difference to those who do use iterator blocks, and I believe almost all of them would find it a welcome addition. It’s simple to understand, and it makes iterator blocks much easier to use in a “correct” manner when any form of initial checking is involved.

Note, mostly to myself and Marc Gravell: I haven’t checked, but I suspect most of Push LINQ is broken in this respect.

Further note to self – I really need to fix my code formatter to treat yield as a contextual keyword.

Trivia: when is a no-op not a no-op?

Hopefully most readers are familiar with the yield break; statement. Usually, if it appears at the end of a method its a no-op which can be removed with no change in behaviour. For instance:

public IEnumerable<int> Range1 (int start, int count)
{
    for (int i=0; i < count; i++)
    {
         yield return start+i;
    }
    yield break;
}

public IEnumerable<int> Range2 (int start, int count)
{
    for (int i=0; i < count; i++)
    {
         yield return start+i;
    }
}

Can anyone think of a situation where a yield break; directly before the closing brace of a method cannot be removed without breaking the code? (I suggest that early answers are given in ROT-13 to avoid spoiling it for anyone else.)

Odd query expressions

Yesterday, I was proof reading chapter 11 of the book (and chapter 12, and chapter 13 – it was a long night). Reading my own text about how query expressions work led me to wonder just how far I could push the compiler in terms of understanding completely bizarre query expressions.

Background

Query expressions in C# 3 work by translating them into “normal” C# first, then applying normal compilation. So, for instance, a query expression of:

from x in words
where x.Length > 5
select x.ToUpper()

… is translated into this:

words.Where(x => x.Length > 5)
     .Select(x => x.ToUpper())

Now, usually the source expression (words in this case) is something like a variable, or the result of a method or property call. However, the spec doesn’t say it has to be… let’s see what else we could do. We’ll keep the query expression as simple as possible – just from x in source select x.

Types

What if the source expression is a type? The query expression would then compile to TypeName.Select(x => x) which of course works if there’s a static Select method taking an appropriate delegate or expression tree. And indeed it works:

using System;

public class SourceType
{
    public static int Select(Func<string,string> ignored)
    {
        Console.WriteLine(“Select called!”);
        return 10;
    }
}

class Test
{
    static void Main()
    {
        int query = from x in SourceType
                    select x;
    }
}

That compiles and runs, printing out Select called!. query will be 10 after the method has been called.

So, we can call a static method on a type. Anything else? Well, let’s go back to the normal idea of the expression being a value, but this time we’ll change what Select(x => x) actually means. There’s no reason it has to be a method call. It looks like a method call, but then so do delegate invocations…

Delegates via properties

Let’s create a “LINQ proxy” which has a property of a suitable delegate type. Again, we’ll only provide Select but it would be possible to do other types – for instance the Where property could be typed to return another proxy.

using System;
using System.Linq.Expressions;

public class LinqProxy
{
    public Func<Expression<Func<string,string>>,int> Select { get; set; }
}

class Test
{
    static void Main()
    {
        LinqProxy proxy = new LinqProxy();
        
        proxy.Select = exp => 
        { 
            Console.WriteLine (“Select({0})”, exp);
            return 15;
        };
        
        int query = from x in proxy
                    select x;
    }
}

Again, it all compiles and runs fine, with an output of “Select(x => x)”. If you wanted, you could combine the two above approaches – the SourceType could have a static delegate property instead of a method. Also this would still work if we used a public field instead of a property.

What else is available?

Looking through the list of potential expressions, there are plenty more we can try to abuse: namespaces, method groups, anonymous functions, indexers, and events. I haven’t found any nasty ways of using these yet – if we could have methods or properties directly in namespaces (instead of in types) then we could use from x in SomeNamespace but until that time, I think sanity is reasonably safe. Of course, if you can find any further examples, please let me know!

Why is this useful?

It’s not. It’s really, really not – at least not in any way I can think of. The “proxy with delegate properties” might just have some weird, twisted use, but I’d have to see it to believe it. Of course, these are useful examples to prove what the compiler’s actually doing, and the extent to which query expressions really are just syntactic sugar (but sooo sweet) – but that doesn’t mean there’s a good use for the technique in real code.

Again, if you can find a genuine use for this, please mail me. I’d love to hear about such oddities.

Implementing deferred execution, and a potential trap to avoid

When talking about LINQ recently, I doodled an implementation of OrderBy on a whiteboard. Now, I know the real OrderBy method has to support ThenBy which makes life slightly tougher, but let’s suppose for a moment that it didn’t. Let’s further suppose that we don’t mind O(n2) efficiency, but we do want to abide by the restriction that the sort should be stable. Here’s one implementation:

public static IEnumerable<TSource> OrderBy<TSource,TKey>
    (this IEnumerable<TSource> source,
     Func<TSource,TKey> keySelector)
{
    IComparer<TKey> comparer = Comparer<TKey>.Default;
    List<TSource> buffer = new List<TSource>();
    foreach (TSource item in source)
    {
        // Find out where to insert the element –
        // immediately before the first element that
        // compares as greater than the new one.           
        TKey key = keySelector(item);
           
        int index = buffer.FindIndex
            (x => comparer.Compare(keySelector(x), key) > 0);
           
        buffer.Insert(index==-1 ? buffer.Count : index, item);
    }
       
    return buffer;
}

What’s wrong with it? Well, it compiles, and seems to run okay – but it doesn’t defer execution. As soon as you call the method, it will suck all the data from the source and sort it. List<T> implements IEnumerable<T> with no problems, so we’re fine to just return buffer… but we can very easily make it defer execution. Look at the end of this code (the actual sorting part is the same):

public static IEnumerable<TSource> OrderBy<TSource,TKey>
    (this IEnumerable<TSource> source,
     Func<TSource,TKey> keySelector)
{
    IComparer<TKey> comparer = Comparer<TKey>.Default;
    List<TSource> buffer = new List<TSource>();
    foreach (TSource item in source)
    {
        // Find out where to insert the element –
        // immediately before the first element that
        // compares as greater than the new one.           
        TKey key = keySelector(item);
           
        int index = buffer.FindIndex
            (x => comparer.Compare(keySelector(x), key) > 0);
           
        buffer.Insert(index==-1 ? buffer.Count : index, item);
    }
       
    foreach (TSource item in buffer)
    {
        yield return item;
    }
}

It seems odd to be manually iterating over buffer instead of just returned it to be iterated over by the client – but suddenly we have deferred execution. None of the above code will run until we actually start asking for data.

The moral of the story? I suspect many developers would change the second form of code into the first, thinking they’re just refactoring – when in fact they’re changing a very significant piece of behaviour. Be careful when implementing iterators!

Data pipelines as a conceptual stepping stone to higher order functions

I was explaining data pipelines in LINQ to Objects to a colleague yesterday, partly as the next step after explaining iterator blocks, and partly because, well, I love talking about C# 3 and LINQ. (Really, it’s becoming a serious problem. Now that I’ve finished writing the main text of my book, my brain is properly processing the stuff I’ve written about. I hadn’t planned to be blogging as actively as I have been recently – it’s just a natural result of thinking about cool things to do with LINQ. It’s great fun, but in some ways I hope it stops soon, because I’m not getting as much sleep as I should.)

When I was describing how methods like Select take an iterator and return another iterator which, when called, will perform appropriate transformations, something clicked. There’s a conceptual similarity between data pipelines using deferred execution, and higher order functions. There’s a big difference though: I can get my head round data pipelines quite easily, whereas I can only understand higher order functions for minutes at a time, and only if I’ve had a fair amount of coffee. I know I’m not the only one who finds this stuff tricky.

So yes, there’s a disconnect in difficulty level – but I believe that the more comfortable one is with data pipelines, the more feasible it is to think of higher order functions. Writing half the implementation of Push LINQ (thanks go to Marc Gravell for the other half) helped my “gut feel” understanding of LINQ itself significantly, and I believe they helped me cope with currying and the like as well.

I have a sneaking suspicion that if everyone who wanted to use LINQ had to write their own implementation of LINQ to Objects (or at least a single overload of each significant method) there’d be a much greater depth of understanding. This isn’t a matter of knowing the fiddly bits – it’s a case of grokking just why OrderBy needs to buffer data, and what deferred execution really means.

(This is moving off the topic slightly, but I really don’t believe it’s inconceivable to suggest that “average” developers implement most of LINQ to Objects themselves. It sounds barmy, but the tricky bit with LINQ to Objects is the design, not the implementation. Suggesting that people implement LINQ to SQL would be a different matter, admittedly. One idea I’ve had for a talk, whether at an MVP open day or a Developer Developer Developer day is to see how much of LINQ to Objects I could implement in a single talk/lecture slot, explaining the code as I went along. I’d take unit tests but no pre-written implementation. I really think that with an experienced audience who didn’t need too much hand-holding around iterator blocks, extension methods etc to start with, we could do most of it. I’m not saying it would be efficient (thinking particularly of sorting in a stable fashion) but it would be feasible, and I think it would encourage people to think about it more themselves. Any thoughts from any of you? Oh, and yes, I did see the excellent video of Luke Hoban doing Select and Where. I’d try to cover grouping, joins etc as well.)

I’m hoping that the more experience I have with writing funky little sequence generators/processors, the more natural functional programming with higher order functions will feel. For those of you who are already comfortable with higher order functions, does this sound like a reasonable hope/expectation? Oh, and is the similarity between the two situations anything to do with monads? I suspect so, but I don’t have a firm enough grasp on monads to say for sure. It’s obviously to do with composition, either way.