Reimplementing LINQ to Objects: Part 20 – ToList

This morning I started writing the tests for GroupBy, prior to implementing it. That turned out to be a pain – really I wanted an easy way of getting at each element of the result (i.e. each result group). If only I had the ability to convert an arbitrary sequence into a query… I needed ToList. So, we enter a fairly brief diversion.

What is it?

ToList has a single, simple overload:

public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)

Fairly obviously, ToList converts the source sequence into a list. Some points to note:

The signature specifies List<T>, not just IList<T>. Of course it could return a subclass of List<T>, but there seems little point.
It uses immediate execution – nothing is deferred here
The parameter (source) musn’t be null
It’s optimized for the case when source implements ICollection<T>
It always creates a new, independent list.

The last two points are worth a bit more discussion. Firstly, the optimization for ICollection<T> isn’t documented, but it makes a lot of sense:

List<T> stores its data in an array internally
ICollection<T> exposes a Count property so the List<T> can create an array of exactly the right size to start with
ICollection<T> exposes a CopyTo method so that the List<T> can copy all the elements into the newly created array in bulk

ToList always creates a new list for consistency. If it just returned the source parameter directly if it was already a List<T>, that would mean that changes to the source after calling ToList would sometimes be visible in the returned list and sometimes not… making it harder to reason about any code which used ToList.

What are we going to test?

I have tests for the bullet points listed above, and one extra test just to prove that it can work with lazily evaluated sequences as well as simple collections like arrays. (The test uses a range followed by a projection.)

In order to test the optimization for ICollection<T>, I’ve implemented another collection like NonEnumerableList, but this time just NonEnumerableCollection. Again, this just delegates all ICollection<T> operations to a backing List<T>, but throws an exception if you try to call GetEnumerator(). The test then just calls ToList on a NonEnumerableCollection: as no exception is thrown, that proves that either the operation is optimized as we’d expect, or the exception is being swallowed. I think it’s reasonable to assume that exceptions aren’t swallowed in LINQ to Objects :)

Let’s implement it!

This will probably be the simplest implementation in the whole of Edulinq:

public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
    {
        throw new ArgumentNullException("source");
    }
    return new List<TSource>(source);
}

You may well be wondering what happened to the optimization… well, it’s in the List<T> constructor. We just get it for free. Unfortunately that’s not documented either… so we end up with an implementation which implements one undocumented optimization if List<T> implements another undocumented optimization :)

We can’t actually do much better than that – we can’t use ICollection<T>.CopyTo ourselves, as we don’t have access to the underlying array of List<T>. We could perform some optimization by calling the List<T> constructor which specifies a capacity, and then call AddRange. That would at least prevent the list from having to resize itself, but it would still need to iterate over the whole collection instead of using the (potentially very fast) CopyTo method.

Conclusion

You may be wondering why we even need ToList, if we could just create a list by calling the constructor directly. The difference is that in order to call a constructor, you need to specify the element type as the type argument. When we use ToList, we can take advantage of type inference. In many cases this is just extremely convenient, but for anonymous types it’s actually required. How can you end up with a strongly typed list of an anonymous type? It’s easy with ToList, like this:

var query = Enumerable.Range(0, 10)
.Select(x => new { Value = x, Doubled = x * 2 });

var list = query.ToList();

Try doing that without an extension method or something similar. It’s worth noting at this point that although there are similar methods for arrays and Dictionary, there’s no equivalent for HashSet. It’s incredibly easy to write, of course, and an obvious extension to LINQ to Objects – but it’s not in the standard library. Maybe for .NET 5…

So, now that we’ve got ToList sorted, I can get back to GroupBy and its eight overloads – easy to implement, but hard to test simply and hard to describe clearly. Lucky me.

6 thoughts on “Reimplementing LINQ to Objects: Part 20 – ToList”

It’s not that a big deal that a change to the source affects the result — after all most of LINQ methods have this property too, strictly speaking.

The big deal is that a change to the result affects the source. That is a big no-no here.

LikeLike

@Mihailik: While I certainly agree that changing the result shouldn’t change the source either, I disagree with your claim that source change => result change isn’t a big deal. While it’s true that some other LINQ operators behave like this, that’s *only* the ones which support deferred execution. Everything which uses *immediate* execution returns a result which is then independent of the original.

LikeLike

Thank you for posting this. I had guessed that, in some cases, the IEnumerable extensions were not limited to the GetEnumerator() defined in IEnumerable. I had a problem with mocking a business object that implements IEnumerable and the execution of a LINQ query containing orderby against this business object was failing to copy any elements from the input to the output; each element ended up null. Well, this business object also implements ICollection, so I had to mock ICollection.CopyTo() as well.

Are you aware of any other LINQ extensions that that SHOULD be limited to IEnumerable that have similar optimizations that could be gotchas?

Thanks again!

LikeLike

jonskeet says:

July 6, 2017 at 10:02 pm

I would basically avoid mocking collections as far as possible.

LikeLike

Reply
1. Tim Hayes says:
  
  July 6, 2017 at 10:17 pm
  
  How could one simply avoid mocking collections? I presume you are suggest creating a real collection and use it in the unit test. The way our business objects are inherited and built (CSLA), I’m not sure I can do that, but I will look into it.
  
  Thanks for the response! Again, any other LINQ extensions that have undocumented optimization trickery?
  
  LikeLike
  
  Reply
  1. jonskeet says:
    
    July 7, 2017 at 5:34 am
    
    The ones documented thorough this series are all the ones I know about, but that’s not to say that’s all that exist.
    
    LikeLike
    
    Reply

Mihailik says:

January 1, 2011 at 9:41 am

It’s not that a big deal that a change to the source affects the result — after all most of LINQ methods have this property too, strictly speaking.

The big deal is that a change to the result affects the source. That is a big no-no here.

LikeLike

skeet says:

January 1, 2011 at 10:03 am

@Mihailik: While I certainly agree that changing the result shouldn’t change the source either, I disagree with your claim that source change => result change isn’t a big deal. While it’s true that some other LINQ operators behave like this, that’s *only* the ones which support deferred execution. Everything which uses *immediate* execution returns a result which is then independent of the original.

LikeLike

Tim Hayes says:

July 6, 2017 at 9:54 pm

Thank you for posting this. I had guessed that, in some cases, the IEnumerable extensions were not limited to the GetEnumerator() defined in IEnumerable. I had a problem with mocking a business object that implements IEnumerable and the execution of a LINQ query containing orderby against this business object was failing to copy any elements from the input to the output; each element ended up null. Well, this business object also implements ICollection, so I had to mock ICollection.CopyTo() as well.

Are you aware of any other LINQ extensions that that SHOULD be limited to IEnumerable that have similar optimizations that could be gotchas?

Thanks again!

LikeLike

1. jonskeet says:
  
  July 6, 2017 at 10:02 pm
  
  I would basically avoid mocking collections as far as possible.
  
  LikeLike
  
  1. Tim Hayes says:
    
    July 6, 2017 at 10:17 pm
    
    How could one simply avoid mocking collections? I presume you are suggest creating a real collection and use it in the unit test. The way our business objects are inherited and built (CSLA), I’m not sure I can do that, but I will look into it.
    
    Thanks for the response! Again, any other LINQ extensions that have undocumented optimization trickery?
    
    LikeLike
    
    1. jonskeet says:
      
      July 7, 2017 at 5:34 am
      
      The ones documented thorough this series are all the ones I know about, but that’s not to say that’s all that exist.
      
      LikeLike

Jon Skeet's coding blog

Reimplementing LINQ to Objects: Part 20 – ToList

What is it?

What are we going to test?

Let’s implement it!

Conclusion

6 thoughts on “Reimplementing LINQ to Objects: Part 20 – ToList”

Leave a reply to Tim Hayes Cancel reply

What is it?

What are we going to test?

Let’s implement it!

Conclusion

Share this:

Related

6 thoughts on “Reimplementing LINQ to Objects: Part 20 – ToList”

Leave a reply to Tim Hayes Cancel reply