Reimplementing LINQ to Objects: Part 26a – IOrderedEnumerable

Implementing OrderBy/OrderByDescending/ThenBy/ThenByDescending isn’t too bad, once you understand how the pattern works, but that’s a slight conceptual leap. I’ll take it one step at a time, first introducing some extra infrastructure (both public and internal), then implementing it simply and slowly, and finally optimizing it somewhat.

The sorting features of LINQ

As a query language, it’s natural that LINQ allows you to sort sequences. Of course, collections have had the ability to sort their contents since v1.0 but LINQ is different in various ways:

  • It expects you’ll usually want to sort via projections, rather than on a "natural" comparison of the whole value. For example, should I come before or after my wife in a sorted list of people? It depends on whether you’re ordering by first name, age, height, weight, or something else.
  • It allows you to easily specify whether you want the ordering to be from the minimum value to the maximum (ascending) or from maximum to minimum (descending).
  • It allows you to specify multiple ordering criteria – for example, order by last name and then first name.
  • It doesn’t sort in-place – instead, the sorting operators each return a new sequence which represents the results of the sort.
  • It allows you to order any sequence implementing IEnumerable<T> – rather than Array.Sort being separate from List<T>.Sort, for example.
  • The ordering operators in LINQ to Objects at least are stable – two equal items will be returned in the original order.

Other than stability (I’m at least unaware of any public, stable sort methods within the framework prior to .NET 3.5), all of this can be done using IComparer<T>, but it’s a bit of a pain. Just as a reminder, IComparer<T> is an interface with a single method:

int Compare(T x, T y)

Unlike IEqualityComparer<T>, an IComparer<T> can only be asked to compare two items to determine which should come before the other in an ordered sequence. There’s no notion of a hash code in IComparer<T>. All sorting in .NET is based on either IComparer<T> or Comparison<T>, which is the equivalent delegate. Like IEqualityComparer<T>, a helper class (Comparer<T>) exists mostly to provide a default comparison based on the IComparable<T> and IComparable interfaces. (Unlike EqualityComparer<T>.Default, there’s no "fall back" to a universal default comparison – if you try to compare two objects which have no comparison defined between them, an exception will be thrown.)

Composing comparers

Eventually, we’ll need our ordering implementation to be built on top of IComparer<T>, so I’ll introduce three helper classes to address the first three bullet points above. MiscUtil has more fully-developed versions of all of these classes – the code I’ve written for Edulinq is the bare minimum required for the project. Due to the way I’m testing (building the same tests against both Edulinq and the .NET framework version) I also don’t have any tests for these classes – but they’re simple enough that I’m pretty confident they’ll work. They don’t validate constructor arguments, as the public API is responsible for using them sensibly.

First, projection: we want to be able to specify a key selector and optionally a comparison between two keys – just as we already do for things like ToLookup and Join, but this time using IComparer<T> instead of IEqualityComparer<T>. Here’s a helper class to build a comparison for two values of a "source" element type based on their keys:

internal class ProjectionComparer<TElement, TKey> : IComparer<TElement>
{
    private readonly Func<TElement, TKey> keySelector;
    private readonly IComparer<TKey> comparer;

    internal ProjectionComparer(Func<TElement, TKey> keySelector,
        IComparer<TKey> comparer)
    {
        this.keySelector = keySelector;
        this.comparer = comparer ?? Comparer<TKey>.Default;
    }

    public int Compare(TElement x, TElement y)
    {
        TKey keyX = keySelector(x);
        TKey keyY = keySelector(y);
        return comparer.Compare(keyX, keyY);
    }
}

As an example of how we might use this, we might want to order a List<Person> by last name in a case-insensitive way. For that, we might use:

IComparer<Person> comparer = new ProjectionComparer<Person, string>
    (p => p.LastName, StringComparer.CurrentCultureIgnoreCase);

Again, in public code I’d provide simpler ways of constructing the comparer to take advantage of type inference – but it isn’t really worth it here.

As you can see, the comparer simply extracts the keys from both of the elements it’s trying to compare, and delegates to the key comparer.

Next up is reversal: by reversing the order of the arguments into a comparer, we can effectively reverse the direction of a sort. Note that due to the stability of LINQ orderings, this is not quite the same as just reversing the final order of the elements: even when sorting by a key in a descending direction, elements with equal keys will be preserved in the original order. This actually drops out naturally when we let the comparer handle the direction – the ordering code can preserve stability without really knowing whether it’s sorting in ascending or descending direction. Here’s the reverse comparer:

internal class ReverseComparer<T> : IComparer<T>
{
    private readonly IComparer<T> forwardComparer;

    internal ReverseComparer(IComparer<T> forwardComparer)
    {
        this.forwardComparer = forwardComparer;
    }

    public int Compare(T x, T y)
    {
        return forwardComparer.Compare(y, x);
    }
}

It’s incredibly simple, but there are two important points to note:

  • It actually violates the contract of IComparer<T>, assuming that the forward comparer doesn’t. The interface documentation states that null values should compare less than any non-null references, and obviously if we’re reversing the original comparison, that won’t be the case. I consider this to be a flaw in the documentation, to be honest: it makes sense to allow the comparer itself to decide how to handle nullity. The LINQ ordering documentation doesn’t make any specific reference to this – it’s unclear whether sorting by a key in a descending manner will leave elements with a null key at the start or the end. It turns out that the .NET implementation acts like my reverse comparer: it treats null like any other value, effectively.
  • A naive implementation would call forwardComparer(x, y) and simply negate the return value. This would hide a subtle bug: if the forward comparer returned Int32.MinValue, so would the reverse comparer. In signed integer types, there are more negative numbers available than positive ones (e.g. the range of SByte is -128 to 127 inclusive) and negating the minimum value is effectively a no-op. By reversing the order of the arguments instead, we avoid the problem.

Our final helper class tackles the problem of creating a compound comparer, composed of a primary comparison and a secondary one. The secondary comparison is only used if the primary comparison considers two items to be equal – so for example, if we were comparing people by last name and then first name, the primary comparison would be the last name comparer, and the secondary comparison would be the first name comparer. Comparing "John Smith" and "Fred Bloggs" doesn’t need to take the first names into account at all, but comparing "Dave Skeet" with "Jon Skeet" does.

If we want more than two comparers, we can simply treat one compound comparer as the primary comparsion for another compound comparer. For example, suppose we want to order by last name, then first name, then age. That’s effectively "compound(compound(last name, first name), age)" in the obvious notation. Again, the code implementing a compound comparer is very simple, but will prove invaluable:

internal class CompoundComparer<T> : IComparer<T>
{
    private readonly IComparer<T> primary;
    private readonly IComparer<T> secondary;

    internal CompoundComparer(IComparer<T> primary,
        IComparer<T> secondary)
    {
        this.primary = primary;
        this.secondary = secondary;
    }

    public int Compare(T x, T y)
    {
        int primaryResult = primary.Compare(x, y);
        if (primaryResult != 0)
        {
            return primaryResult;
        }
        return secondary.Compare(x, y);
    }
}

(I could have used a conditional expression in the Compare method here, but this time I felt it wasn’t worth it.)

With these classes in place, we can move on to look at IOrderedEnumerable – and implement some of it.

Implementing IOrderedEnumerable<TElement>

When we eventually get to OrderBy, OrderByDescending, ThenBy and ThenByDescending (in the next post) they will all return IOrderedEnumerable<TElement>. By focusing on that first, we can actually make the LINQ operators themselves fairly trivial. The interface itself looks simple enough:

public interface IOrderedEnumerable<TElement> : IEnumerable<TElement>, IEnumerable
{
    IOrderedEnumerable<TElement> CreateOrderedEnumerable<TKey>(
        Func<TElement, TKey> keySelector,
        IComparer<TKey> comparer,
        bool descending)
}

The important point to understand is that most developers will never need to call CreateOrderedEnumerable directly themselves – or even care about the interface. What most of us care about is that we can call ThenBy and ThenByDescending after we’ve called OrderBy or OrderByDescending – and that’s what the interface enables. The ThenBy/ThenByDescending operators are extension methods on IOrderedEnumerable<TElement> instead of IEnumerable<T>, precisely because they have to be called on sequences which already have a primary ordering.

The CreateOrderedEnumerable method is precisely designed to create a new IOrderedEnumerable which represents the existing ordered sequence but with an extra secondary ordering applied.

The tricky bit to get your head round is that this sort of composition doesn’t work the same way as our normal sequence chaining. Usually – for Where, Select etc – we fetch items from a "parent" sequence, perform some filtering or projection or whatever, and then yield the results. We can’t do that to provide a secondary ordering without information about the primary ordering – we’d effectively have to know where the boundaries between the "primary sort keys" occur, so that we only reordered within one key. Even if we could do that, it would be relatively painful from a performance perspective.

Instead, we need to create an IOrderedEnumerable implementation which remembers the original, unordered sequence – and then builds up a compound comparer as it goes. That way, we can then do all the ordering in one go, rather than each intermediate ordered sequence doing its own little bit.

This is possibly easier to show than to describe – my implementation of OrderedEnumerable is really fairly short. Here it is, just with the actual sorting code removed:

internal class OrderedEnumerable<TElement> : IOrderedEnumerable<TElement>
{
    private readonly IEnumerable<TElement> source;
    private readonly IComparer<TElement> currentComparer;

    internal OrderedEnumerable(IEnumerable<TElement> source,
        IComparer<TElement> comparer)
    {
        this.source = source;
        this.currentComparer = comparer;
    }

    public IOrderedEnumerable<TElement> CreateOrderedEnumerable<TKey>
        (Func<TElement, TKey> keySelector,
         IComparer<TKey> comparer,
         bool descending)
    {
        if (keySelector == null)
        {
            throw new ArgumentNullException("keySelector");
        }
        IComparer<TElement> secondaryComparer =
            new ProjectionComparer<TElement, TKey> (keySelector, comparer);
        if (descending)
        {
            secondaryComparer = new ReverseComparer<TElement>(secondaryComparer);
        }
        return new OrderedEnumerable<TElement>(source,
            new CompoundComparer<TElement>(currentComparer, secondaryComparer));
    }

    public IEnumerator<TElement> GetEnumerator()
    {
        // Perform the actual sorting here: basically load
        // "source" into a buffer, and then use a stable
        // sort algorithm with "currentComparer" to compare
        // any two elements
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Note how when we create one IOrderedEnumerable from another, we don’t store a reference to the original – it can be garbage-collected as far as we’re concerned. We just care about the original (unordered) source, and the new compound comparer.

You can already see how useful our comparer helper classes are already. They allow us to focus on a single abstraction – IComparer<T> – and build interesting comparisons based on it, just as LINQ to Objects focuses on IEnumerable<T> and gives us interesting ways to work with sequences. Each aspect of the comparer we build is distinct:

  • A projection comparer takes our key selector and key comparer, and builds a plain IComparer<TElement>
  • If the secondary ordering is descending, we create a reverse comparer to wrap the projection comparer
  • We then build a compound comparer from whatever comparison we were already using, and the newly created one

I suspect that this isn’t the most efficient way to perform the comparison, to be honest: there’s an awful lot of wrapping going on. However, I think it’s probably the clearest way to represent things – and that’s the goal of Edulinq.

Conclusion

We now have all the internal plumbing in place. There are two more tasks:

  • Implementing the extension methods themselves
  • Performing the sort

I’ll cover those in the next two posts (26b and 26c). In some ways, they’re likely to be less interesting than this post (unless I work out a particularly interesting sort algorithm to use). It’s the design of sorting in LINQ which I find particularly neat – the way that it’s tailored to what developers really need compared with the primitive Sort methods of old, and the elegant way in which those abilities are expressed. As ever, we’ve ended up with lots of small building blocks – but ones that can be combined to great effect.

10 thoughts on “Reimplementing LINQ to Objects: Part 26a – IOrderedEnumerable”

  1. I’m curious about the performance of your ProjectionComparer compared to the .NET equivalent. Unless posts 26b and 26c are especially clever, you’ll end up calling keyComparer twice for every pair of objects being sorted, which I believe in the worst case is 2*N^2 calls — quite unfortunate if computing the sort key is an expensive operation.

    Perl hackers handle this with the “Schwartzian transform”: turning the sequence of values into a sequence of key/value pairs, performing the sort, and finally extracting the sorted values.

    Like

  2. @Jesse: That’s a good point. I’ll do some investigation into how often the .NET version computes the keys (I’ll just keep a count in a projection). I’ll finish implementing it all this way, and then maybe add a 26d with an alternative implementation :)

    Like

  3. @Jesse: A very quick test suggests that the real LINQ to Objects does indeed map each value to a key only once. Thinking about it further, I suspect I can reasonably easily modify the code to perform the transform you’re talking about… definitely worth trying later on. It’s a shame really, as I love the simplicity of what I’ve got already :)

    Like

  4. Good to know it’s reasonably easy. It occured to me that this is an issue of correctness, not just performance: calling keySelector only once per item means it can safely be non-idempotent, such as mapping each item to Random.Next() to get a random shuffle.

    Like

  5. @Jesse: It could certainly change the behaviour of code. There’s no guarantee in the documentation that the keySelector will only be called once per key though – I personally wouldn’t want to rely on it. But making the implementation act more like LINQ to Objects is a good idea :)

    Like

  6. @Leyu: Do you mean in the declaration of the interface? It’s redundant, but it does no harm – I just copied it from MSDN :)

    Like

  7. “For example, should I come before or after my wife in a sorted list of people? It depends on whether you’re ordering by first name, age, height, weight, or something else.”

    That’s quite a dangerous comment! I hope your wife doesn’t read your blog. I know I’d get 20 questions, and no matter what the answers, I think things would end up with flowers and a fancy dinner. ;-)

    Like

  8. Maybe I am overthinking, but I am worried about the performance caused by the large number of indirections one could have with this approach. So you have a Comparer that calls CompoundComparer -> ReverseComparer -> CompoundComparer -> CompoundComparer -> … and so on. Maybe I am being paranoid?

    That said, I cannot think in any other.

    Like

Leave a comment