Reimplementing LINQ to Objects: Part 27 – Reverse

January 8, 2011 jonskeet 17 Comments

Time for a change of pace after the deep dive into sorting. Reversing is pretty simple… which is not to say there’s nothing to discuss, of course.

What is it?

Reverse only has a single, simple signature:

public static IEnumerable<TSource> Reverse<TSource>(
this IEnumerable<TSource> source)

The behaviour is pretty simple to describe:

source cannot be null; this is validated eagerly.
The operator uses deferred execution: until you start reading from the result, it won’t read anything from the input sequence
As soon as you start reading from the result sequence, the input sequence is read in its entirety
The result sequence contains all the elements of the input sequence, in the opposite order. (So the first element of the result sequence is the last element of the input sequence.)

The third point is the most interesting. It sounds like an obvious requirement just to get it to work at all – until you think of possible optimizations. Imagine if you implemented Reverse with an optimization for arrays: we know the array won’t change size, and we can find out that size easily enough – so we could just use the indexer on each iteration, starting off with an index of "length – 1" and decrementing until we’d yielded every value.

LINQ to Objects doesn’t behave this way – and that’s observable because if you change the value of the array after you start iterating over the result sequence, you don’t see those changes. Deferred execution means that you will see changes made to the array after the call to Reverse but before you start iterating over the results, however.

Note that the buffering nature of this operator means that you can’t use it on infinite sequences – which makes sense when you think about it. What’s the last element of an infinite sequence?

What are we going to test?

Most of the tests are pretty obvious, but I have one test to demonstrate how the timing of changes to the contents of the input sequence affect the result sequence:

[Test]
public void ArraysAreBuffered()
{
    // A sneaky implementation may try to optimize for the case where the collection
    // implements IList or (even more "reliable") is an array: it mustn’t do this,
    // as otherwise the results can be tainted by side-effects within iteration
    int[] source = { 0, 1, 2, 3 };

    var query = source.Reverse();
    source[1] = 99; // This change *will* be seen due to deferred execution
    using (var iterator = query.GetEnumerator())
    {
        iterator.MoveNext();
        Assert.AreEqual(3, iterator.Current);

        source[2] = 100; // This change *won’t* be seen
        iterator.MoveNext();
        Assert.AreEqual(2, iterator.Current);

iterator.MoveNext();
Assert.AreEqual(99, iterator.Current);

        iterator.MoveNext();
        Assert.AreEqual(0, iterator.Current);
    }
}

If you can think of any potentially-surprising tests, I’d be happy to implement them – there wasn’t much I could think of in terms of corner cases.

Let’s implement it!

Eager validation of source combined with deferred execution suggests the normal implementation of splitting the operator into two methods – I won’t bother showing the public part, as it only does exactly what you’d expect it to. However, to make up for the fact that Reverse is so simple, I’ll present three implementations of the "Impl" method.

First, let’s use a collection which performs the reversing for us automatically: a stack. The iterator returned by Stack<T> returns items in the order in which they would be seen by multiple calls to Pop – i.e. the reverse of the order in which they were added. This makes the implementation trivial:

private static IEnumerable<TSource> ReverseImpl<TSource>(IEnumerable<TSource> source)
{
    Stack<TSource> stack = new Stack<TSource>(source);
    foreach (TSource item in stack)
    {
        yield return item;
    }
}

Again, with "yield foreach" we could have done this in a single statement.

Next up, a linked list. In some ways, using a linked list is very natural – you never need to resize an array, or anything like that. On the other hand, we have an extra node object for every single element, which is a massive overhead. It’s not what I’d choose to use for production in this case, but it’s worth showing:

private static IEnumerable<TSource> ReverseImpl<TSource>(IEnumerable<TSource> source)
{
    LinkedList<TSource> list = new LinkedList<TSource>(source);
    LinkedListNode<TSource> node = list.Last; // Property, not method!
    while (node != null)
    {
        yield return node.Value;
        node = node.Previous;
    }
}

Finally, a more "close to the metal" approach using our existing ToBuffer method:

private static IEnumerable<TSource> ReverseImpl<TSource>(IEnumerable<TSource> source)
{
    int count;
    TSource[] array = source.ToBuffer(out count);
    for (int i = count – 1; i >= 0; i–)
    {
        yield return array[i];
    }
}

This is probably not significantly more efficient than the version using Stack<T> – I expect Stack<T> has a similar implementation to ToBuffer when it’s constructed with an input sequence. However, as it’s so easy to count down from the end of the array to the start, we don’t really need to take advantage of any of the features of Stack – so we might as well just use the array directly.

Note that this relies on the fact that ToBuffer will create a copy of whatever it’s given, including an array. That’s okay though – we’re relying on that all over the place :)

Conclusion

It’s hard to see how this could really be optimized any further, other than by improving ToBuffer based on usage data. Overall, a lovely simple operator.

It’s probably about time I tackled some of the arithmetic aggregation operators… so next time I’ll probably implement Sum.

17 thoughts on “Reimplementing LINQ to Objects: Part 27 – Reverse”

Steven says:

January 8, 2011 at 5:17 am

Jon, you did implement ToArray in part 24, so why don’t you use the ToArray extension method and write it as follows?

TSource[] array = source.ToArray();

for (int i = array.Length – 1; i >= 0; i–)
{
yield return array[i];
}

Cheers

LikeLike

Reply
skeet says:

January 8, 2011 at 5:23 am

@Steven: For a non-ICollection, ToArray will result in a copy of the array being made, to get it to exactly the right size. We don’t care about the array being oversized, so we can avoid the extra copy.

LikeLike

Reply
tobi says:

January 8, 2011 at 7:04 am

You could use a List and store chunks of 32 elements each. That way you eliminate up to 3x resizing overhead and many copy operations.

I have also created Pex tests for OrderBy/ThenBy: http://rapidshare.com/files/441447992/Edulinq.Tests.Pex.zip

MS Pex must be installed: http://research.microsoft.com/en-us/projects/pex/Downloads.aspx

LikeLike

Reply
skeet says:

January 8, 2011 at 7:18 am

@tobi: I’ll investigate your Pex tests tonight. The “chunks of chunks” idea has pros and cons, of course – it increases the level of indirection for *every* operation, for example. If storing chunks were always a net win, that’s how List would be implemented to start with :)

A proper performance investigation would need lots of implementations and lots of situations, too.

LikeLike

Reply
tobi says:

January 8, 2011 at 7:22 am

Yeah, I only thought about Reverse in case of a non-ICollection. You only pay any indirection once every 32 elements. You could even go crazy and have the first 16 elements in copy-n-pasted stack variables so that you do not have any allocations for small sequences. This is merely a theoretical thought…

LikeLike

Reply
skeet says:

January 8, 2011 at 8:08 am

@tobi: I don’t see your point about only paying for indirection once every 32 elements… surely every comparison may be between elements of different chunks. Maybe I’m just not understanding your proposal properly.

LikeLike

Reply
tobi says:

January 8, 2011 at 9:19 am

Hm this is usually a sign that I am overlooking something. Here is what I meant:

Reverse(seq) {
List buffers = seq.AsChunked(32).ToList(); //#1
return buffers.Reverse().SelectMany(b => b.Reverse()); //#2
}

The linq expression in (#2) of course expanded into 2 nested for loops for performance reasons. I just wrote it with linq to make it shorter. When written as 2 for loops you would access the array b with a simple indexing expression (in the inner loop).

LikeLike

Reply
skeet says:

January 8, 2011 at 9:34 am

@tobi: Ah – I was overlooking the fact that we were talking about Reverse, not OrderBy/ThenBy. Doh! :)

LikeLike

Reply
Alois Kraus says:

January 8, 2011 at 10:44 am

An interesting corner case would be a char array which has Unicode compound characters inside it. If you reverse a character array you will end up with invalid Unicode characters inside it.

One way to check it would be to use Char.IsLowSurrogate/IsHighSurrogate and IsSurrogatePair
http://msdn.microsoft.com/en-us/library/wwwsy4wb.aspx

Yours,
Alois Kraus

LikeLike

Reply
configurator says:

January 8, 2011 at 2:35 pm

I like the stack solution; it’s neat. I’d much prefer it if you popped each element before returning it though – a constant theme I’ve seen in Edulinq is keeping references to items that have already been yielded and I think that might be a problem for any serious streaming operation.

LikeLike

Reply
skeet says:

January 8, 2011 at 2:40 pm

@configurator: Well, it can’t be a *fully* streaming solution here anyway, given that we’ve got to buffer everything first.

I did nearly add a line to set the array value to default(TSource) after each iteration…

LikeLike

Reply
Nico says:

January 19, 2011 at 10:53 am

Jon,

I guess I miss something… But why didn’t you consider a simple List for the buffer?

list = new List(sequence);
for (int i = list.Count – 1; i >= 0; –i)
{
yield return list[i];
}

PS: Your series is really superb!

LikeLike

Reply
skeet says:

January 19, 2011 at 11:28 am

@Nico: I did consider it, but we already have the “raw” ToBuffer method, which does all we want. We don’t need to get the higher level abstraction of a List involved. It would work of course… but would be ever-so-slightly less efficient without improving readability.

Glad you’re enjoying the posts :)

LikeLike

Reply
SLaks says:

February 1, 2011 at 5:16 pm

If you’re willing to drop laziness, you can simplify it to

public static IEnumerable Reverse(this IEnumerable source) {
return new Stack(source);
}

(and get free null-checking too)

LikeLike

Reply
skeet says:

February 1, 2011 at 5:20 pm

@SLaks: If you’re willing to break documented behaviour, there are all kinds of things you can do :)

LikeLike

Reply
Sam says:

March 18, 2011 at 6:16 pm

In the comments for the test case you say:

“A sneaky implementation may try to optimize for the case where the collection implements IList or (even more “reliable”) is an array: it mustn’t do this, as otherwise the results can be tainted by side-effects within iteration”

Just a quick question, but why isn’t this allowed? If you ran the same test using Select(x => x) and tried to modified the array then you’d get the tainted values, so why can’t reverse do it?

LikeLike

Reply
skeet says:

March 19, 2011 at 11:34 am

@Sam: I see your point, but leaving the array example aside for a minute, the iterator of a collection will usually detect modification. Arrays are sneaky in all kinds of ways – although at least with an array, you’d know the *size* wouldn’t change between calls…

LikeLike

Reply

Jon Skeet's coding blog

Reimplementing LINQ to Objects: Part 27 – Reverse

What is it?

What are we going to test?

Let’s implement it!

Conclusion

17 thoughts on “Reimplementing LINQ to Objects: Part 27 – Reverse”

Leave a comment Cancel reply

What is it?

What are we going to test?

Let’s implement it!

Conclusion

Share this:

Related

17 thoughts on “Reimplementing LINQ to Objects: Part 27 – Reverse”

Leave a comment Cancel reply