Implementing deferred execution, and a potential trap to avoid

When talking about LINQ recently, I doodled an implementation of OrderBy on a whiteboard. Now, I know the real OrderBy method has to support ThenBy which makes life slightly tougher, but let’s suppose for a moment that it didn’t. Let’s further suppose that we don’t mind O(n2) efficiency, but we do want to abide by the restriction that the sort should be stable. Here’s one implementation:

public static IEnumerable<TSource> OrderBy<TSource,TKey>
    (this IEnumerable<TSource> source,
     Func<TSource,TKey> keySelector)
{
    IComparer<TKey> comparer = Comparer<TKey>.Default;
    List<TSource> buffer = new List<TSource>();
    foreach (TSource item in source)
    {
        // Find out where to insert the element –
        // immediately before the first element that
        // compares as greater than the new one.           
        TKey key = keySelector(item);
           
        int index = buffer.FindIndex
            (x => comparer.Compare(keySelector(x), key) > 0);
           
        buffer.Insert(index==-1 ? buffer.Count : index, item);
    }
       
    return buffer;
}

What’s wrong with it? Well, it compiles, and seems to run okay – but it doesn’t defer execution. As soon as you call the method, it will suck all the data from the source and sort it. List<T> implements IEnumerable<T> with no problems, so we’re fine to just return buffer… but we can very easily make it defer execution. Look at the end of this code (the actual sorting part is the same):

public static IEnumerable<TSource> OrderBy<TSource,TKey>
    (this IEnumerable<TSource> source,
     Func<TSource,TKey> keySelector)
{
    IComparer<TKey> comparer = Comparer<TKey>.Default;
    List<TSource> buffer = new List<TSource>();
    foreach (TSource item in source)
    {
        // Find out where to insert the element –
        // immediately before the first element that
        // compares as greater than the new one.           
        TKey key = keySelector(item);
           
        int index = buffer.FindIndex
            (x => comparer.Compare(keySelector(x), key) > 0);
           
        buffer.Insert(index==-1 ? buffer.Count : index, item);
    }
       
    foreach (TSource item in buffer)
    {
        yield return item;
    }
}

It seems odd to be manually iterating over buffer instead of just returned it to be iterated over by the client – but suddenly we have deferred execution. None of the above code will run until we actually start asking for data.

The moral of the story? I suspect many developers would change the second form of code into the first, thinking they’re just refactoring – when in fact they’re changing a very significant piece of behaviour. Be careful when implementing iterators!

Leave a comment