Optimisations in LINQ to Objects

(Edited on February 11th, 2010 to take account of a few mistakes and changes in the .NET 4.0 release candidate.)

I’ve just been fiddling with the first appendix of C# in Depth, which covers the standard query operators in LINQ, and describes a few details of the LINQ to Objects implementations. As well as specifying which operators stream and which buffer their results, and immediate vs deferred execution, I’ve where LINQ optimises for different collection types – or where it doesn’t, but could. I’m not talking about optimisations which require knowledge of the projection being applied or an overall picture of the query (e.g. seq.Reverse().Any() being optimised to seq.Any()) – this is just about optimisations which can be done on a pretty simple basis.

There are two main operations which can be easily optimised in LINQ: random access to an element by index, and the count of a collection. The tricky thing about optimisations like this is that we do have to make assumptions about implementation: I’m going to assume that any implementation of an interface with a Count property will be able to return the count very efficiently (almost certainly straight from a field) and likewise that random access via an indexer will be speedy. Given that both these operations already have their own LINQ operators (when I say "operator" in this blog post, I mean "LINQ query method" rather than an operator at the level of C# as a language) let’s look at those first.

Count

Count() should be pretty straightforward. Just to be clear, I’m only talking about the overload which doesn’t take a predicate: it’s pretty hard to see how you can do better than iterating through and counting matches when there is a predicate involved.

There are actually two common interfaces which declare a Count property: ICollection<T> and ICollection. While many implemenations of ICollection<T> will also implement ICollection (including List<T> and T[]), it’s not guaranteed: they’re independent interfaces, unrelated other than by the fact that they both extend IEnumerable.

The MSDN documentation for Enumerable.Count() states:

If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count.

This is accurate for .NET 3.5, but in .NET 4.0 it does optimise for ICollection as well. (In beta 2 it only optimised for ICollection, skipping the generic interface.)

ElementAt

The equivalent of an indexer in LINQ is the ElementAt operator. Note that it can’t really be an indexer as there’s no such thing as an "extension indexer" which is arguably a bit of a pity, but off-topic. Anyway, the obvious interface to look for here is IList<T>… and that’s exactly what ElementAt does. It ignores the possibility that you’ve only implemented the nongeneric IList – but I think that’s fairly reasonable. After all, the extension method extends IEnumerable<T>, so your collection has to be aware of generics – why would you implement IList but not IList<T>? Also, using the implementation of IList would involve a conversion from object to T, which would at the very least be ugly.

So ElementAt doesn’t actually do too badly. Now that we’ve got the core operations, what else could be optimised?

SequenceEqual

If you were going to write a method to compare two lists, you might end up with something like this (ignoring nullity concerns for the sake of brevity):

public static bool ListEqual<T>(List<T> first, List<T> second, IEqualityComparer<T> comparer)
{
    // Optimise for reflexive comparison
    if (first == second)
    {
        return true;
    }
    // If the counts are different we know the lists are different
    if (first.Count != second.Count)
    {
        return false;
    }
    // Compare each pair of elements in turn
    for (int i = 0; i < first.Count; i++)
    {
        if (!comparer.Equals(first[i], second[i]))
        {
            return false;
        }
    }
    return true;
}

Note the two separate optimisations. The first is always applicable, unless you really want to deal with sequences which will yield different results if you call GetEnumerator() on them twice. You could certainly argue that that would be a legitimate implementation, but I’d be interested to see a situation in which it made sense to try to compare such a sequence with itself and return false. SequenceEqual perform this optimisation.

The second optimisation – checking for different counts – is only really applicable in the case where we know that Count is optimised for both lists. In particular, I always make a point of only iterating through each source sequence once when I write a custom LINQ operator – you never know when you’ll be given a sequence which reads a huge log file from disk, yielding one line at a time. (Yes, that is my pet example, but it’s a good one and I’m sticking to it.) But we can certainly tell if both sequences implement ICollection or ICollection<T>, so it would make sense to have an "early negative" in that situation.

Last(predicate)

(All of this applies to LastOrDefault as well, by the way.) The implementation of Last which doesn’t take a predicate is already optimised for the IList<T> case: in that situation the method finds out the count, and returns list[count – 1] as you might expect. We certainly can’t do that when we’ve been given a predicate, as the last value might not match that predicate. However, we could walk backwards from the end of the list… if you have a list which contains a million items, and the last-but-one matches the predicate, you don’t really want to test the first 999998 items, do you? Again, this assumes that we can keep using random access on the list, but I think that’s reasonable for IList<T>.

Reverse

Reverse is an interesting case, because it uses deferred execution and streams data. In reality, it always takes a complete copy of the sequence (which in itself does optimise for the case where it implements ICollection<T>; in that situation you know the count to start with and can use source.CopyTo(destinationArray) to speed things up). You might consider an optimisation which uses random access if the source is an implementation of IList<T> – you could just lazily yield the elements in reverse order using random access. However, that would change behaviour. Admittedly the behaviour of Reverse may not be what people expect in the first place. What would you predict that this code does?

string[] array = { "a", "b", "c", "d" };

var query = array.Reverse();
array[0] = "a1";
        
var iterator = query.GetEnumerator();
array[0] = "a2";
        
// We’ll assume we know when this will stop :)
iterator.MoveNext();
Console.WriteLine(iterator.Current);
array[0] = "a3";
        
iterator.MoveNext();
Console.WriteLine(iterator.Current);
array[0] = "a4";

iterator.MoveNext();
Console.WriteLine(iterator.Current);
array[0] = "a5";
        
iterator.MoveNext();
Console.WriteLine(iterator.Current);

After careful thought, I accurately predicted the result (d, c, b, a2) – but you do need to take deferred execution *and* eager buffering into account. If nothing else, this should be a lesson in not changing the contents of query sources while you’re iterating over the query unless you’re really sure of what you’re doing.

With the candidate "optimisation" in place, we’d see (d, c, b, a5), but only when working on array directly. Working on array.Select(x => x) would have to give the original results, as it would have to iterate through all the initial values before finding the last one.

LongCount

This is an interesting one… LongCount really doesn’t make much sense unless you expect your sequence to have more than 2^31 elements, but there’s no optimisation present. The contract for IList<T> doesn’t state what Count should do if the list has more than Int32.MaxValue elements, so that can’t really be used – but potentially Array.LongValue could be used for large arrays.

A bigger question is when this would actually be useful. I haven’t tried timing Enumerable.Range(0, int.MaxValue) to see how long it would take to become relevant, but I suspect it would be a while. I can see how LongCount could be useful in LINQ to SQL – but does it even make sense in LINQ to Objects? Maybe it will be optimised in a future version with ILongList<T> for large lists…

EDIT: In fact, given comments, it sounds like the time taken to iterate over int.MaxValue items isn’t that high after all. That’ll teach me to make assumptions about running times without benchmarking… I still can’t say I’ve seen LongCount used in anger in LINQ to Objects, but it’s not quite as silly as I thought :)

Conclusion

The optimisations I’ve described here all have the potential to take a long-running operation down to almost instantaneous execution, in the "right" situation. There may well be other opportunities lurking – things I haven’t thought of. The good news is that missing optimisations could be applied in future releases without breaking any sane code. I do wonder whether supporting methods (e.g. TryFastCount and TryFastElementAt) would be useful to encourage other people writing their own LINQ operators to optimise appropriately – but that’s possibly a little too much of a niche case.

Blog post frequency may well change in either direction for the near future – I’m going to be very busy with last-minute changes, fixes, indexing etc for the book, which will give me even less time for blogging. On the other hand, it can be mind-numbingly tedious, so I may resort to blogging as a form of relief…

26 thoughts on “Optimisations in LINQ to Objects”

  1. I would add Any() to this list, which could return Count > 0 for sequences that provide that property (per your Count() discussion above). A micro-optimization for sure, but an optimization nonetheless.

    Cheers ~
    Keith

    Like

  2. @Keith: I’m not convinced about that. It *always* adds an execution-time type test, and the benefit in the situations when it’s a collection aren’t likely to be significant; it’s O(1) either way.

    I think there has to be a decent benefit before it’s worth slowing down the non-optimised case.

    Like

  3. @Oren: That must be a change post beta 2 then – in 4.0b2 it definitely only checks for the non-generic ICollection. When the RC bits finally get released to non-MSDN subscribers, I’ll check for myself – thanks for the heads-up!

    Like

  4. One very interesting omission I solved was the ‘enumerate over n enumerables in parrellel’. I had to implement the extension method and classes for each ‘n’, but it is pretty cool. I had an extra bitmasked field in the class that would indicate if one of the enumerables had terminated – so working over pure IEnumerables (and generic) was really fast (provided the (in)equality was early in the list or the lists didn’t have the same length) for equality operations (and easy to implement too). @jcdickinson or comment on my blog if you want the implementation.

    Like

  5. I thought was in beta 2, but I forget… I filed a bug a while ago on Connect about it and they agreed and made the change.

    The other case is with covariance — where T isn’t enough as a cast to ICollection would fail if you’re accessing something as IEnumerable. Another reason to implement ICollection :)

    Like

  6. @Jonathan: Do you mean the equivalent of Zip (in .NET 4.0) but working over more sequences?

    @Oren: I hadn’t thought of the covariance aspect. Yes, that’s interesting.

    Like

  7. @mq: Absolutely.

    This comment was posted by pete.d, but unfortunately I hit “delete” instead of “approve” so I’ve got to post it here myself. Doh.

    I always found it a bit odd that ICollection doesn’t require ICollection. Regarding your observation that Enumerable.Count() doesn’t use ICollection if present, only ICollection: that was my recollection too. But, I just looked at the .NET 3.5 implementation on my Windows 7 install, and it uses ICollection, not ICollection.

    IMHO, it ought to use _both_. But I think maybe the implementation was “fixed” to match the documentation at some point. What implementation are you looking at to determine that it uses only ICollection and not ICollection?

    Regarding the ElementAt() method: I wonder if there would be value in using reflection (caching the results, of course) to check for a true indexer in a class, and using that if IList isn’t implemented.

    Regarding the SequenceEqual() method: perhaps this is a bit contrived, but consider this class:

    class RandomSequence : IEnumerable

    {

    private readonly Random _rnd = new Random();

    public Count { get; private set; }

    public RandomSequence(int count)

    {

    Count = count;

    }

    public IEnumerator GetEnumerator()

    {

    int count = Count;

    while (count– > 0)

    {

    yield return _rnd.Next();

    }

    }

    private IEnumerator IEnumerable.GetEnumerator()

    {

    return GetEnumerator();

    }

    }

    The contrived context in which this is used would be a situation where you need some random sequence of integers for testing purposes, and a particular instance of this class is returned multiple times to code that is being tested (i.e. it uses a sequence of integers for something, the origin of which is normally different enumerations, but a test harness that is reusing the same object to produce results).

    I’ll grant that’s an “out there” enough scenario that perhaps we shouldn’t worry about it. But it does mess up the reference-equality test optimization. :(

    Finally, regarding the Reverse() method: the code you posted doesn’t produce the sequence you describe. I believe you meant to write “query.GetEnumerator()” instead of “array.GetEnumerator()”?

    Also, while I get your point, IMHO a more interesting example would be when the last element in the array is being modified, not the first. After all, when modifying the first element, the reversed output (what you describe) is identical to the non-reversed output (what the code produces), except for the order. Modifying the last element shows clearly the difference between the snapshot that the Reverse() method takes upon the first call to MoveNext(), and the “live” look that using the enumerator provides:

    string[] array = { “a”, “b”, “c”, “d” };

    var query = array.Reverse();

    array[3] = “d1”;

    var iterator = query.GetEnumerator();

    array[3] = “d2”;

    // We’ll assume we know when this will stop :)

    iterator.MoveNext();

    Console.WriteLine(iterator.Current);

    array[3] = “d3”;

    iterator.MoveNext();

    Console.WriteLine(iterator.Current);

    array[3] = “d4”;

    iterator.MoveNext();

    Console.WriteLine(iterator.Current);

    array[3] = “d5”;

    iterator.MoveNext();

    Console.WriteLine(iterator.Current);

    Of course, the whole thing does involve questionable practices of modifying the collection contents while enumerating the collection, and actually depending on the interaction of the Reverse() method with said modification. IMHO, these kinds of situations can be documented as “undefined results”, leaving the implementation free to make _sensible_ optimizations such as lazily retrieving items as you suggest.

    Like

  8. @pete.d: I’ve been looking at .NET 4.0b2. I’ve just confirmed that .NET 3.5 (SP1) does indeed use just the generic form.

    I’ve fixed the typo in the sample too :)

    Regarding the random sequence… yes, that would certainly confuse things. But part of my point was that it would be very strange to want to use SequenceEqual on that to start with… when would you want to use a sequence with strange behaviour like that but *not* know that you were doing so (and thus avoid things like this)?

    Like

  9. Well, I did say it was contrived. :)

    My main point is that it’s a potential “gotcha” in real code. A slightly less contrived example might be a LINQ query or other deferred-execution IEnumerable where there’s some parametric behavior (e.g. anonymous method with captured variable) that can change between enumerations. Depending on how much is cached in the generation of the enumeration itself and at what point the parameter affects the outcome, the enumeration could wind up exactly the same for both (as long as they are both evaluated in parallel by the SequenceEquals() method…a different, caching implementation of the sequence comparison would behave differently), or different, even though the instance of the enumerable is the same.

    By the way, I noticed in hindsight that my example also demonstrates an enumerable where the retrieval of elements from one sequence actually perturbs the results of retrieval from the other (interleaved use of the Random.Next() method).

    Anyway, I would agree that these are things that should not generally come up in practice, and the author of the client code should be cognizant of the pitfalls. _But_ I also think that there’s value in a library having a fairly conservative approach to optimization.

    In general, we expect compilers to make optimizations only when they KNOW FOR SURE the optimization won’t affect the outcome of the code (I know of exceptions to that rule, but they are infrequent and occur only when explicitly requested by the programmer). Likewise, it seems to me a library like .NET ought to be a bit more defensive and choose correctness over performance, even if the cases where correctness is in question would be rare and experience by someone who ought to “know better”.

    After all, if there’s a performance problem related to the lack of the special-case optimization, the client of the library can always detect that (through proper performance measurements) and deal with it explicitly. I know I don’t need to tell you about the hazards of premature optimizations! :)

    It seems to me that an optimization should produce correct results _even when the code being optimized is ridiculous_.

    Like

  10. There really needs to be IReadOnlyList and IReadOnlyCollection where IList : IReadOnlyList and ICollection : IReadOnlyCollection that have covariance.

    The getters would be on the read-only interface and the setters on the regular ones. Then things like ElementAt could work with covariance. I was really hoping that those interfaces would be added in .NET 4 as they wouldn’t break existing code but add worlds of flexibility.

    Like

  11. Running here (i7 920 with Hyperthreading, 64bit Win7, .NET4RC)

    Enumerable.Range(0, int.MaxValue).Count();

    takes ~11.5s (default console application, release build).

    With LongCount, the time is not significantly different.

    So it is certainly feasible to handle sequences of more than UInt32.MaxValue items.

    Like

  12. @Richard: Thanks for that. I guess I should have checked after all :) Will edit when I get a bit of time, to insert that info…

    Like

  13. @pete.d

    ICollection doesn’t require ICollection because it can’t without circumventing itself. Anders explains (via Brad Abrams) here: http://blogs.msdn.com/brada/archive/2005/01/18/355755.aspx.

    @Oren,

    I have also sorely wished for similar interfaces. The problem is that they *would* break existing code in any class where the existing code explicitly implements the interface methods. Interfaces truly are set in stone once they ship. There would have to be a design change to the CLR to make this possible without breaking existing code.

    Re: Count

    I noticed in the Connect bug that they said it had been fixed to check both interfaces, when it in fact had been changed to only check ICollection. My guess is that whoever was making the change forgot that ICollection doesn’t require ICollection :) The good news is that is has been fixed in the RC.

    Like

  14. “ICollection doesn’t require ICollection because it can’t without circumventing itself. Anders explains (via Brad Abrams) here: blogs.msdn.com/…/355755.aspx.”

    I’m not sure I understand that reasoning. ICollection itself doesn’t include an input methods. Only output methods. So while ICollection is invariant, I don’t see why that precludes ICollection inheriting the read-only ICollection interface.

    In hindsight, looking through the interface definitions more closely, I think it’s more likely they just decided that ICollection includes stuff that they didn’t want to require for ICollection implementers (in particular, the thread-safety properties). In other words, the ICollection interface itself didn’t meet the standards Microsoft wanted to apply to the newer generic types, and so it was left out of the inheritance hierarchy.

    It also doesn’t help that they added features to ICollection that don’t exist in ICollection, making the interfaces not nearly as closely related to each other as IEnumerable and IEnumerable are.

    Of course, that’s pure speculation. But I don’t see any fundamental barrier to having ICollection inherit ICollection.

    Like

  15. Enumerable.Range(0, int.MaxValue).AsParallel()
    .Count(x=>x>0)

    Took 3 minutes and 40 seconds to compute in linqpad.

    You asked for it.

    Like

  16. @pete.d

    You’re right, the argument for IList and IList doesn’t apply in the same way to ICollection and ICollection. I so rarely use ICollection anymore that I forgot what it looks like :) Interesting that neither Brad nor Anders addressed that.

    I seem to remember reading at some point that they had decided that ICollection was just poorly designed from the beginning. But I can’t find the source for that now.

    Like

  17. “it’s pretty hard to see how you can do better than iterating through and counting matches when there is a predicate involved.”

    Not so hard. If your data relay on a position and owns an index, you can use some kind of binary search to find the first occurence where a Predicate occurs. Say you look for FirstName=’Jon’, the algorithm will find the first match, even if there are many Jon. Change the binary search lightly (less dramatic that to Reverse the positionnal list), and you can, in theory, find the LAST occurence, and thus, the count is the position of the last occurence less the position of the first occurence, +1. So, Count on simple precidate (indlucing FirstName BETWEEN “Jon” AND “Pete”) can also be optimized, wihout explicit walk trough.

    Like

  18. @Vanderghast: That requires knowledge of the predicate, which means changing the signature (e.g. to Expression<Func>). That’s all very well, but it’s not the sort of optimisation I’m talking about in this blog post – see the first paragraph.

    Like

  19. You may be interested in the Nito.Linq library I’m working on:
    http://nitolinq.codeplex.com/

    It’s mainly an attempt to extend LINQ-like ideas to IList, but it includes a lot of the optimizations you’ve mentioned (SequenceEqual, Reverse, Last/LastOrDefault with predicate) and also a few others (Skip and Take, and Repeat and Zip from Rx). It was partially inspired by your MoreLinq project.

    Current state: http://nitolinq.codeplex.com/SourceControl/changeset/view/39784#724207

    Like

  20. I like all of these small optimizations but ultimately I still think some new kind of way of treating LINQ to Objects will have to come by. And a whole suite of immutable collection types. Maybe we’ll see this in the next .Net. The “yield foreach” would just be part of the optimization answer. Another a way to “globally” optimize LINQ queries so that L2O can replace most imperative coding styles where it would be possible to do so in highly optimized functional languages. Maybe in 10 years, who knows… LINQ is still very far from its full potential.

    Like

  21. Following a thread on the R# EAP forum, I have just been profiling Last() running over arrays.

    It appears that the IList cast of an array is expensive (obviously better than walking the whole collection, but still expensive).

    Interestingly enough, a customised version of Last() which first tries a cast to TSource[] before the cast to IList was *enormously* faster for both arrays AND, somewhat suprisingly to me, not significantly slower for Lists.

    Casting to the non generic IList and then casting the resulting element was also a fair bit faster than casting to IList.

    Does anyone know why casts to IList would be so slow?

    It does seem that situations where you’re calling Linq functions against arrays might be better served by a more carefuly optimised library.

    Like

Leave a comment