Reimplementing LINQ to Objects: Part 27 – Reverse

Time for a change of pace after the deep dive into sorting. Reversing is pretty simple… which is not to say there’s nothing to discuss, of course.

What is it?

Reverse only has a single, simple signature:

public static IEnumerable<TSource> Reverse<TSource>(
    this IEnumerable<TSource> source)

The behaviour is pretty simple to describe:

  • source cannot be null; this is validated eagerly.
  • The operator uses deferred execution: until you start reading from the result, it won’t read anything from the input sequence
  • As soon as you start reading from the result sequence, the input sequence is read in its entirety
  • The result sequence contains all the elements of the input sequence, in the opposite order. (So the first element of the result sequence is the last element of the input sequence.)

The third point is the most interesting. It sounds like an obvious requirement just to get it to work at all – until you think of possible optimizations. Imagine if you implemented Reverse with an optimization for arrays: we know the array won’t change size, and we can find out that size easily enough – so we could just use the indexer on each iteration, starting off with an index of "length – 1" and decrementing until we’d yielded every value.

LINQ to Objects doesn’t behave this way – and that’s observable because if you change the value of the array after you start iterating over the result sequence, you don’t see those changes. Deferred execution means that you will see changes made to the array after the call to Reverse but before you start iterating over the results, however.

Note that the buffering nature of this operator means that you can’t use it on infinite sequences – which makes sense when you think about it. What’s the last element of an infinite sequence?

What are we going to test?

Most of the tests are pretty obvious, but I have one test to demonstrate how the timing of changes to the contents of the input sequence affect the result sequence:

[Test]
public void ArraysAreBuffered()
{
    // A sneaky implementation may try to optimize for the case where the collection
    // implements IList or (even more "reliable") is an array: it mustn’t do this,
    // as otherwise the results can be tainted by side-effects within iteration
    int[] source = { 0, 1, 2, 3 };

    var query = source.Reverse();
    source[1] = 99; // This change *will* be seen due to deferred execution
    using (var iterator = query.GetEnumerator())
    {
        iterator.MoveNext();
        Assert.AreEqual(3, iterator.Current);

        source[2] = 100; // This change *won’t* be seen               
        iterator.MoveNext();
        Assert.AreEqual(2, iterator.Current);

        iterator.MoveNext();
        Assert.AreEqual(99, iterator.Current);

        iterator.MoveNext();
        Assert.AreEqual(0, iterator.Current);
    }
}

If you can think of any potentially-surprising tests, I’d be happy to implement them – there wasn’t much I could think of in terms of corner cases.

Let’s implement it!

Eager validation of source combined with deferred execution suggests the normal implementation of splitting the operator into two methods – I won’t bother showing the public part, as it only does exactly what you’d expect it to. However, to make up for the fact that Reverse is so simple, I’ll present three implementations of the "Impl" method.

First, let’s use a collection which performs the reversing for us automatically: a stack. The iterator returned by Stack<T> returns items in the order in which they would be seen by multiple calls to Pop – i.e. the reverse of the order in which they were added. This makes the implementation trivial:

private static IEnumerable<TSource> ReverseImpl<TSource>(IEnumerable<TSource> source)
{
    Stack<TSource> stack = new Stack<TSource>(source);
    foreach (TSource item in stack)
    {
        yield return item;
    }
}

Again, with "yield foreach" we could have done this in a single statement.

Next up, a linked list. In some ways, using a linked list is very natural – you never need to resize an array, or anything like that. On the other hand, we have an extra node object for every single element, which is a massive overhead. It’s not what I’d choose to use for production in this case, but it’s worth showing:

private static IEnumerable<TSource> ReverseImpl<TSource>(IEnumerable<TSource> source)
{
    LinkedList<TSource> list = new LinkedList<TSource>(source);
    LinkedListNode<TSource> node = list.Last; // Property, not method!
    while (node != null)
    {
        yield return node.Value;
        node = node.Previous;
    }
}

Finally, a more "close to the metal" approach using our existing ToBuffer method:

private static IEnumerable<TSource> ReverseImpl<TSource>(IEnumerable<TSource> source)
{
    int count;
    TSource[] array = source.ToBuffer(out count);
    for (int i = count – 1; i >= 0; i–)
    {
        yield return array[i];
    }
}

This is probably not significantly more efficient than the version using Stack<T> – I expect Stack<T> has a similar implementation to ToBuffer when it’s constructed with an input sequence. However, as it’s so easy to count down from the end of the array to the start, we don’t really need to take advantage of any of the features of Stack – so we might as well just use the array directly.

Note that this relies on the fact that ToBuffer will create a copy of whatever it’s given, including an array. That’s okay though – we’re relying on that all over the place :)

Conclusion

It’s hard to see how this could really be optimized any further, other than by improving ToBuffer based on usage data. Overall, a lovely simple operator.

It’s probably about time I tackled some of the arithmetic aggregation operators… so next time I’ll probably implement Sum.

17 thoughts on “Reimplementing LINQ to Objects: Part 27 – Reverse”

  1. Jon, you did implement ToArray in part 24, so why don’t you use the ToArray extension method and write it as follows?

    TSource[] array = source.ToArray();

    for (int i = array.Length – 1; i >= 0; i–)
    {
    yield return array[i];
    }

    Cheers

    Like

  2. @Steven: For a non-ICollection, ToArray will result in a copy of the array being made, to get it to exactly the right size. We don’t care about the array being oversized, so we can avoid the extra copy.

    Like

  3. @tobi: I’ll investigate your Pex tests tonight. The “chunks of chunks” idea has pros and cons, of course – it increases the level of indirection for *every* operation, for example. If storing chunks were always a net win, that’s how List would be implemented to start with :)

    A proper performance investigation would need lots of implementations and lots of situations, too.

    Like

  4. Yeah, I only thought about Reverse in case of a non-ICollection. You only pay any indirection once every 32 elements. You could even go crazy and have the first 16 elements in copy-n-pasted stack variables so that you do not have any allocations for small sequences. This is merely a theoretical thought…

    Like

  5. @tobi: I don’t see your point about only paying for indirection once every 32 elements… surely every comparison may be between elements of different chunks. Maybe I’m just not understanding your proposal properly.

    Like

  6. Hm this is usually a sign that I am overlooking something. Here is what I meant:

    Reverse(seq) {
    List buffers = seq.AsChunked(32).ToList(); //#1
    return buffers.Reverse().SelectMany(b => b.Reverse()); //#2
    }

    The linq expression in (#2) of course expanded into 2 nested for loops for performance reasons. I just wrote it with linq to make it shorter. When written as 2 for loops you would access the array b with a simple indexing expression (in the inner loop).

    Like

  7. I like the stack solution; it’s neat. I’d much prefer it if you popped each element before returning it though – a constant theme I’ve seen in Edulinq is keeping references to items that have already been yielded and I think that might be a problem for any serious streaming operation.

    Like

  8. @configurator: Well, it can’t be a *fully* streaming solution here anyway, given that we’ve got to buffer everything first.

    I did nearly add a line to set the array value to default(TSource) after each iteration…

    Like

  9. Jon,

    I guess I miss something… But why didn’t you consider a simple List for the buffer?

    list = new List(sequence);
    for (int i = list.Count – 1; i >= 0; –i)
    {
    yield return list[i];
    }

    PS: Your series is really superb!

    Like

  10. @Nico: I did consider it, but we already have the “raw” ToBuffer method, which does all we want. We don’t need to get the higher level abstraction of a List involved. It would work of course… but would be ever-so-slightly less efficient without improving readability.

    Glad you’re enjoying the posts :)

    Like

  11. If you’re willing to drop laziness, you can simplify it to

    public static IEnumerable Reverse(this IEnumerable source) {
    return new Stack(source);
    }

    (and get free null-checking too)

    Like

  12. In the comments for the test case you say:

    “A sneaky implementation may try to optimize for the case where the collection implements IList or (even more “reliable”) is an array: it mustn’t do this, as otherwise the results can be tainted by side-effects within iteration”

    Just a quick question, but why isn’t this allowed? If you ran the same test using Select(x => x) and tried to modified the array then you’d get the tainted values, so why can’t reverse do it?

    Like

  13. @Sam: I see your point, but leaving the array example aside for a minute, the iterator of a collection will usually detect modification. Arrays are sneaky in all kinds of ways – although at least with an array, you’d know the *size* wouldn’t change between calls…

    Like

Leave a comment