Odd query expressions

Yesterday, I was proof reading chapter 11 of the book (and chapter 12, and chapter 13 – it was a long night). Reading my own text about how query expressions work led me to wonder just how far I could push the compiler in terms of understanding completely bizarre query expressions.

Background

Query expressions in C# 3 work by translating them into “normal” C# first, then applying normal compilation. So, for instance, a query expression of:

from x in words
where x.Length > 5
select x.ToUpper()

… is translated into this:

words.Where(x => x.Length > 5)
     .Select(x => x.ToUpper())

Now, usually the source expression (words in this case) is something like a variable, or the result of a method or property call. However, the spec doesn’t say it has to be… let’s see what else we could do. We’ll keep the query expression as simple as possible – just from x in source select x.

Types

What if the source expression is a type? The query expression would then compile to TypeName.Select(x => x) which of course works if there’s a static Select method taking an appropriate delegate or expression tree. And indeed it works:

using System;

public class SourceType
{
    public static int Select(Func<string,string> ignored)
    {
        Console.WriteLine(“Select called!”);
        return 10;
    }
}

class Test
{
    static void Main()
    {
        int query = from x in SourceType
                    select x;
    }
}

That compiles and runs, printing out Select called!. query will be 10 after the method has been called.

So, we can call a static method on a type. Anything else? Well, let’s go back to the normal idea of the expression being a value, but this time we’ll change what Select(x => x) actually means. There’s no reason it has to be a method call. It looks like a method call, but then so do delegate invocations…

Delegates via properties

Let’s create a “LINQ proxy” which has a property of a suitable delegate type. Again, we’ll only provide Select but it would be possible to do other types – for instance the Where property could be typed to return another proxy.

using System;
using System.Linq.Expressions;

public class LinqProxy
{
    public Func<Expression<Func<string,string>>,int> Select { get; set; }
}

class Test
{
    static void Main()
    {
        LinqProxy proxy = new LinqProxy();
        
        proxy.Select = exp => 
        { 
            Console.WriteLine (“Select({0})”, exp);
            return 15;
        };
        
        int query = from x in proxy
                    select x;
    }
}

Again, it all compiles and runs fine, with an output of “Select(x => x)”. If you wanted, you could combine the two above approaches – the SourceType could have a static delegate property instead of a method. Also this would still work if we used a public field instead of a property.

What else is available?

Looking through the list of potential expressions, there are plenty more we can try to abuse: namespaces, method groups, anonymous functions, indexers, and events. I haven’t found any nasty ways of using these yet – if we could have methods or properties directly in namespaces (instead of in types) then we could use from x in SomeNamespace but until that time, I think sanity is reasonably safe. Of course, if you can find any further examples, please let me know!

Why is this useful?

It’s not. It’s really, really not – at least not in any way I can think of. The “proxy with delegate properties” might just have some weird, twisted use, but I’d have to see it to believe it. Of course, these are useful examples to prove what the compiler’s actually doing, and the extent to which query expressions really are just syntactic sugar (but sooo sweet) – but that doesn’t mean there’s a good use for the technique in real code.

Again, if you can find a genuine use for this, please mail me. I’d love to hear about such oddities.

Implementing deferred execution, and a potential trap to avoid

When talking about LINQ recently, I doodled an implementation of OrderBy on a whiteboard. Now, I know the real OrderBy method has to support ThenBy which makes life slightly tougher, but let’s suppose for a moment that it didn’t. Let’s further suppose that we don’t mind O(n2) efficiency, but we do want to abide by the restriction that the sort should be stable. Here’s one implementation:

public static IEnumerable<TSource> OrderBy<TSource,TKey>
    (this IEnumerable<TSource> source,
     Func<TSource,TKey> keySelector)
{
    IComparer<TKey> comparer = Comparer<TKey>.Default;
    List<TSource> buffer = new List<TSource>();
    foreach (TSource item in source)
    {
        // Find out where to insert the element –
        // immediately before the first element that
        // compares as greater than the new one.           
        TKey key = keySelector(item);
           
        int index = buffer.FindIndex
            (x => comparer.Compare(keySelector(x), key) > 0);
           
        buffer.Insert(index==-1 ? buffer.Count : index, item);
    }
       
    return buffer;
}

What’s wrong with it? Well, it compiles, and seems to run okay – but it doesn’t defer execution. As soon as you call the method, it will suck all the data from the source and sort it. List<T> implements IEnumerable<T> with no problems, so we’re fine to just return buffer… but we can very easily make it defer execution. Look at the end of this code (the actual sorting part is the same):

public static IEnumerable<TSource> OrderBy<TSource,TKey>
    (this IEnumerable<TSource> source,
     Func<TSource,TKey> keySelector)
{
    IComparer<TKey> comparer = Comparer<TKey>.Default;
    List<TSource> buffer = new List<TSource>();
    foreach (TSource item in source)
    {
        // Find out where to insert the element –
        // immediately before the first element that
        // compares as greater than the new one.           
        TKey key = keySelector(item);
           
        int index = buffer.FindIndex
            (x => comparer.Compare(keySelector(x), key) > 0);
           
        buffer.Insert(index==-1 ? buffer.Count : index, item);
    }
       
    foreach (TSource item in buffer)
    {
        yield return item;
    }
}

It seems odd to be manually iterating over buffer instead of just returned it to be iterated over by the client – but suddenly we have deferred execution. None of the above code will run until we actually start asking for data.

The moral of the story? I suspect many developers would change the second form of code into the first, thinking they’re just refactoring – when in fact they’re changing a very significant piece of behaviour. Be careful when implementing iterators!

Data pipelines as a conceptual stepping stone to higher order functions

I was explaining data pipelines in LINQ to Objects to a colleague yesterday, partly as the next step after explaining iterator blocks, and partly because, well, I love talking about C# 3 and LINQ. (Really, it’s becoming a serious problem. Now that I’ve finished writing the main text of my book, my brain is properly processing the stuff I’ve written about. I hadn’t planned to be blogging as actively as I have been recently – it’s just a natural result of thinking about cool things to do with LINQ. It’s great fun, but in some ways I hope it stops soon, because I’m not getting as much sleep as I should.)

When I was describing how methods like Select take an iterator and return another iterator which, when called, will perform appropriate transformations, something clicked. There’s a conceptual similarity between data pipelines using deferred execution, and higher order functions. There’s a big difference though: I can get my head round data pipelines quite easily, whereas I can only understand higher order functions for minutes at a time, and only if I’ve had a fair amount of coffee. I know I’m not the only one who finds this stuff tricky.

So yes, there’s a disconnect in difficulty level – but I believe that the more comfortable one is with data pipelines, the more feasible it is to think of higher order functions. Writing half the implementation of Push LINQ (thanks go to Marc Gravell for the other half) helped my “gut feel” understanding of LINQ itself significantly, and I believe they helped me cope with currying and the like as well.

I have a sneaking suspicion that if everyone who wanted to use LINQ had to write their own implementation of LINQ to Objects (or at least a single overload of each significant method) there’d be a much greater depth of understanding. This isn’t a matter of knowing the fiddly bits – it’s a case of grokking just why OrderBy needs to buffer data, and what deferred execution really means.

(This is moving off the topic slightly, but I really don’t believe it’s inconceivable to suggest that “average” developers implement most of LINQ to Objects themselves. It sounds barmy, but the tricky bit with LINQ to Objects is the design, not the implementation. Suggesting that people implement LINQ to SQL would be a different matter, admittedly. One idea I’ve had for a talk, whether at an MVP open day or a Developer Developer Developer day is to see how much of LINQ to Objects I could implement in a single talk/lecture slot, explaining the code as I went along. I’d take unit tests but no pre-written implementation. I really think that with an experienced audience who didn’t need too much hand-holding around iterator blocks, extension methods etc to start with, we could do most of it. I’m not saying it would be efficient (thinking particularly of sorting in a stable fashion) but it would be feasible, and I think it would encourage people to think about it more themselves. Any thoughts from any of you? Oh, and yes, I did see the excellent video of Luke Hoban doing Select and Where. I’d try to cover grouping, joins etc as well.)

I’m hoping that the more experience I have with writing funky little sequence generators/processors, the more natural functional programming with higher order functions will feel. For those of you who are already comfortable with higher order functions, does this sound like a reasonable hope/expectation? Oh, and is the similarity between the two situations anything to do with monads? I suspect so, but I don’t have a firm enough grasp on monads to say for sure. It’s obviously to do with composition, either way.

Visualising the Mandelbrot set with LINQ – yet again

I’ve been thinking about ranges again, particularly after catching a book error just in time, and looking at Marc’s generic complex type. It struck me that my previous attempts were all very well, and demonstrated parallelisation quite neatly, but weren’t very LINQy. In particular, they certainly didn’t use LINQ for the tricky part in the same way that Luke Hoban’s ray tracer does.

The thing is, working out the “value” of a particular point in the Mandelbrot set visualisation is actually quite well suited to LINQ:

  1. Start with a complex value
  2. Apply a transformation to it (square the current value, add the original value from step 1).
  3. Does the result have a modulus > 2? If so, stop.
  4. Have we performed as many iterations as we’re willing to? If so, stop.
  5. Take our new value, and go back to 2.

We only need two “extra” bits of code to implement this: a Complex type, and a way of applying the same transformation repeatedly.

Well, here’s the Complex type – it contains nothing beyond what we need for this demo, but it’ll do. Obviously Marc’s generic implementation is rather more complete.

public struct Complex
{
    readonly double real;
    readonly double imaginary;

    public Complex(double real, double imaginary)
    {
        this.real = real;
        this.imaginary = imaginary;
    }

    public static Complex operator +(Complex c1, Complex c2)
    {
        return new Complex(c1.real + c2.real, c1.imaginary + c2.imaginary);
    }

    public static Complex operator *(Complex c1, Complex c2)
    {
        return new Complex(c1.real*c2.real – c1.imaginary*c2.imaginary,
                           c1.real*c2.imaginary + c2.real*c1.imaginary);
    }

    public double SquareLength
    {
        get { return real * real + imaginary * imaginary; }
    }
}

Simple stuff, assuming you know anything about complex numbers.

The other new piece of code is even simpler. It’s just a generator. It’s a static method which takes an initial value, and a delegate to apply to one value to get the next. It then lazily returns the generated sequece – forever.

public static IEnumerable<T> Generate<T>(T start, Func<T, T> step)
{
    T value = start;
    while (true)
    {
        yield return value;
        value = step(value);
    }
}

Just as an example of use, remember Enumerable.Range which starts at a particular integer, then adds one repeatedly until it’s yielded as many results as you’ve asked for? Well, here’s a possible implementation, given the Generate method:

public static IEnumerable<int> Range(int start, int count)
{
    return Generate(start, x => x + 1).Take(count);
}

These are all the building blocks we require to build our Mandelbrot visualisation. We want a query which will return a sequence of bytes, one per pixel, where each byte represents the colour to use. Anything which goes beyond our maximum number of iterations ends up black (colour 0) and other values will cycle round the colours. I won’t show the drawing code, but the query is now more self-contained:

var query = from row in Enumerable.Range(0, ImageHeight)
            from col in Enumerable.Range(0, ImageWidth)
            // Work out the initial complex value from the row and column
            let c = new Complex((col * SampleWidth) / ImageWidth + OffsetX,
                                (row * SampleHeight) / ImageHeight + OffsetY)
            // Work out the number of iterations
            select Generate(c, x => x * x + c).TakeWhile(x => x.SquareLength < 4)
                                              .Take(MaxIterations)
                                              .Count() into count
            // Map that to an appropriate byte value
            select (byte)(count == MaxIterations ? 0 : (count % 255) + 1);

(The various constants used in the expression are defined elsewhere.) This works, and puts the Mandelbrot logic directly into the query. However, I have to admit that it’s much slower than my earlier versions. Heck, I’m still proud of it though.

As ever, full source code is available for download, should you so wish.

LambdaExpression – source code would be nice

Just taking a quick break from proof-reading to post a thought I had yesterday. Visual LINQ uses ToString() to convert an expression tree’s body into readable text. In some cases it works brilliantly, reproducing the original source code exactly – but in other cases it’s far from useful. For instance, from this expression tree representation:

(Convert(word.get_Chars(0)) = 113)

I suspect you wouldn’t have guessed that the original code was:

word[0] == ‘q’

Personally I don’t find it terrifically obvious, even though I can see how it works. Here’s a thought though – suppose the LambdaExpression class had a Source property which the compiler could optionally provide as a constructor argument, so that you could get at the original source code if you needed it. The use in Visual LINQ is obvious, but would it be useful elsewhere?

Well, suppose that LINQ to SQL couldn’t translate a particular query expression into SQL. Wouldn’t it be nice if it could report the part of the query expression it was having trouble with in the language it was originally written in? So a VB expression would give source in VB, a C# expression would give source in C# etc.

All of this would have to be optional, and I suspect some people would have intellectual property concerns about their source leaking out (most of which would be silly, due to the logic effectively being available with a call to ToString() anyway). I think it would be quite handy in a few situations though.

Visual LINQ: Watch query expressions as they happen!

Critical link (in case you can’t find it): Source Code Download

Update: Dmitry Lyalin has put together a screencast of Visual LINQ in action – it gives a much better idea of what it’s like than static pictures do. There’s music, but no speech – so you won’t be missing any important information if you mute it. 

I was going to save this until it was rather more polished, but I’ve just started reading the first proofs of C# in Depth, so it’s unlikely I’ll have much time for coding in the near future. I’m too excited about the results of Monday night to keep them until the book’s done :)

After my Human LINQ experiment, I was considering how it my be possible to watch queries occurring on a computer. I had the idea of turning LINQ to Objects into WPF animations… and it just about works. The initial version took about 3 hours on Monday night, and it’s had a few refinements since. It’s very definitely not finished, but I’ll go into that in a minute.

The basic idea is that you write a nearly-normal query expression like this:

 

VisualSource<string> words = new VisualSource<string>
    (new[] { “the”, “quick”, “brown”, “fox”, “jumped”,
             “over”, “the”, “lazy”, “dog” },
     canvas, “words”);

var query = from word in words
            where word.Length > 3
            select new { Word = word, Length = word.Length};

pipeline = query.End();

… and then watch it run. At the moment, the code is embedded in the constructor for MainWindow, but obviously it needs to be part of the UI at some point in the future. To explain the above code a little bit further, the VisualSource class displays a data source on screen, and calling End() on a query creates a data sink which will start fetching data as soon as you hit the relevant “Go!” button in the UI. Speaking of which, here’s what you see when you start it:

When you tell it to go, words descend from the source, and hit the “where” clause:

As you can see, “the” doesn’t meet the criterion of “word.Length > 3”, so the answer “False” is fading up. Fast forward a few seconds, and we’ll see the first passing result has reached the bottom, with the next word on its way down:

Results accumulate at the bottom so you can see what’s going on:

To be honest, it’s better seen “live” as an animation… but the important thing is that none of the text above is hand-specified (other than the data and the source name). If you change the query and rebuild/run, the diagram will change – so long as I’ve actually implemented the bits you use. So far, you can use:

  • select
  • where
  • group (with trivial or non-trivial elementSelector)
  • multiple “from” clauses (invoking SelectMany)

Select and Where have the most polish applied to them – they’ve got relatively fancy animations. Grouping works, but it appears to be just swallowing data until it spews it all out later – I want to build up what it’s going to return visually as it’s doing it. SelectMany could be a lot prettier than it is.

So, what’s wrong with it? Well:

  • Ordering isn’t implemented
  • Join isn’t implemented
  • GroupJoin isn’t implemented
  • The animation could do with tweaking (a lot!)
  • The code itself is fairly awful
  • The UI is unpolished (but functional) – my WPF skills are sadly lacking
  • It would be nice to lay the query out more sensibly (it gets messy with multiple sources for multiple “from” clauses)
  • Allow the user to enter a query (and their own data sources)

Despite all of that though, I’m really pleased with it. It uses expression trees to create a textual representation of the logic, then compiles them to delegates for the actual projection etc. A bit of jiggery pokery was needed to make anonymous types look nice, and I dare say there’ll be more of that to come.

Interested yet? The source code is available so you can play with it yourself. Let me know if you plan to make any significant improvements, and we could consider starting a SourceForge project for it.

My first screencast – automatic properties

I’ve recently been introduced to Dmitry Lyalin of Better Know a Framework. We’re getting together to do some screencasts, and the first one is now up. I’ll be embedding these on my C# in Depth site as well. The next one will be on object and collection initializers, then anonymous types. After that, we’ll see :)

Anyway, I’d be interested in comments. Things I know I want to do better:

  • Use a headset to get less hum on the narration (and fewer clicks etc)
  • Longer pauses between sections – they were recorded separately, but then put too close together

There’s not a lot I can do about sounding ridiculously posh, unfortunately – but don’t take that as a sign that I am posh.

I’d be particularly interested to know whether or not it’s helpful to hear the typing. I deliberately recorded the visual part with audio on (the narration was recorded before, then rerecorded after) so that you’d get the feeling this was “real” – but I don’t know whether it’s worth it or not.

Anyway, enjoy!