Today I’ve implemented six operators, each with two overloads. At first I expected the implementations to be very similar, but they’ve all turned out slightly differently…
What are they?
We’ve got three axes of permutation here: {First, Last, Single}, {with/without OrDefault }, {with/without a predicate}. That gives us these twelve signatures:
public static TSource First<TSource>(
this IEnumerable<TSource> source)
public static TSource First<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
public static TSource FirstOrDefault<TSource>(
this IEnumerable<TSource> source)
public static TSource FirstOrDefault<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
public static TSource Last<TSource>(
this IEnumerable<TSource> source)
public static TSource Last<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
public static TSource LastOrDefault<TSource>(
this IEnumerable<TSource> source)
public static TSource LastOrDefault<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
public static TSource Single<TSource>(
this IEnumerable<TSource> source)
public static TSource Single<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
public static TSource SingleOrDefault<TSource>(
this IEnumerable<TSource> source)
public static TSource SingleOrDefault<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
The shared behaviour is as follows:
- They’re all extension methods with a single generic type argument
- They’re all implemented with immediate execution
- They all validate that their parameters are non-null
- The overloads with a predicate are equivalent to calling source.Where(predicate).SameOperator() – in other words, they just add a filter before applying the operator.
With those rules applied, we simply need to consider three possibilities for each operator: what happens if the source sequence is empty, contains a single element, or contains multiple elements. (This is after applying the filter if a predicate is specified, of course.) We can draw the results up in a simple table:
| Operator |
Empty sequence |
Single element |
Multiple elements |
| First |
Throws exception |
Returns element |
Returns first element |
| FirstOrDefault |
Returns default(TSource) |
Returns element |
Returns first element |
| Last |
Throws exception |
Returns element |
Returns last element |
| LastOrDefault |
Returns default(TSource) |
Returns element |
Returns last element |
| Single |
Throws exception |
Returns element |
Throws exception |
| SingleOrDefault |
Returns default(TSource) |
Returns element |
Throws exception |
As you can see, for an input sequence with a single element, the results are remarkably uniform :) Likewise, for an empty input sequence, any operator without "OrDefault" throws an exception (InvalidOperationException, in fact) and any operator with "OrDefault" returns the default value for the element type (null for reference types, 0 for int etc). The operators really differ if the (potentially filtered) input sequence contains multiple elements – First and Last do the obvious thing, and Single throws an exception. It’s worth noting that SingleOrDefault also throws an exception – it’s not like it’s saying, "If the sequence is a single element, return it – otherwise return the default value." If you want an operator which handles multiple elements, you should be using First or Last, with the "OrDefault" version if the sequence can legitimately have no elements. Note that if you do use an "OrDefault" operator, the result is exactly the same for an empty input sequence as for an input sequence containing exactly one element which is the default value. (I’ll be looking at the DefaultIfEmpty operator next.)
Now we know what the operators do, let’s test them.
What are we going to test?
This morning I tweeted that I had written 72 tests before writing any implementation. In fact I ended up with 80, for reasons we’ll come to in a minute. For each operator, I tested the following 12 cases:
- Null source (predicate-less overload)
- Null source (predicated overload)
- Null predicate
- Empty sequence, no predicate
- Empty sequence, with a predicate
- Single element sequence, no predicate
- Single element sequence where the element matched the predicate
- Single element sequence where the element didn’t match the predicate
- Multiple element sequence, no predicate
- Multiple element
- Multiple element sequence where one element matched the predicate
- Multiple element sequence where multiple elements matched the predicate
These were pretty much cut and paste jobs – I used the same data for each test against each operator, and just changed the expected results.
There are two extra tests for each of First and FirstOrDefault, and two for each of Last and LastOrDefault:
- First/FirstOrDefault should return as soon as they’ve seen the first element, when there’s no predicate; they shouldn’t iterate over the rest of the sequence
- First/FirstOrDefault should return as soon as they’ve seen the first matching element, when there is a predicate
- Last/LastOrDefault are optimized for the case where the source implements IList<T> and there’s no predicate: it uses Count and the indexer to access the final element
- Last/LastOrDefault is not optimized for the case where the source implements IList<T> but there is a predicate: it iterates through the entire sequence
The last two tests involved writing a new collection called NonEnumerableList which implements IList<T> by delegating everything to a backing List<T>, except for GetEnumerator() (both the generic and nongeneric forms) which simply throws a NotSupportedException. That should be a handy one for testing optimizations in the future. I’ll discuss the optimization for Last when we get there.
Let’s implement them!
These operators were more interesting to implement than I’d expect, so I’m actually going to show all twelve methods. It was rarely just a matter of cutting and pasting, other than for the argument validation.
Of course, if we chose to implement the predicated versions using "Where" and the non-predicated form, and the "OrDefault" versions by using "DefaultIfEmpty" followed by the non-defaulting version, we would only have had the three non-predicated, non-defaulting versions to deal with… but as I’ve said before, there are some virtues to implementing each operator separately.
For the sake of avoiding fluff, I’ve removed the argument validation from each method – but obviously it’s there in the real code. Let’s start with First:
public static TSource First<TSource>(
this IEnumerable<TSource> source)
{
using (IEnumerator<TSource> iterator = source.GetEnumerator())
{
if (iterator.MoveNext())
{
return iterator.Current;
}
throw new InvalidOperationException(
"Sequence was empty");
}
}
public static TSource First<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
foreach (TSource item in source)
{
if (predicate(item))
{
return item;
}
}
throw new InvalidOperationException("No items matched the predicate");
}
These look surprisingly different – and that was actually a deliberate decision. I could easily have implemented the predicate-less version with a foreach loop as well, just returning unconditionally from its body. However, I chose to emphasize the fact that we’re not looping in First: we simply move to the first element if we can, and return it or throw an exception. There’s no hint that we might ever call MoveNext again. In the predicated form, of course, we have to keep looping until we find a matching value – only throwing the exception when we’ve exhausted all possibilities.
Now let’s see how it looks when we return a default for empty sequences:
public static TSource FirstOrDefault<TSource>(
this IEnumerable<TSource> source)
{
using (IEnumerator<TSource> iterator = source.GetEnumerator())
{
return iterator.MoveNext() ? iterator.Current :
default(TSource);
}
}
public static TSource FirstOrDefault<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
foreach (TSource item in source)
{
if (predicate(item))
{
return item;
}
}
return default(TSource);
}
Here the predicated form looks very similar to First, but the predicate-less one is slightly different: instead of using an if block (which would be a perfectly valid approach, of course) I’ve used the conditional operator. We’re going to return something, whether we manage to move to the first element or not. Arguably it would be nice if the conditional operator allowed the second or third operands to be "throw" expressions, taking the overall type of the expression from the other result operand… but it’s no great hardship.
Next up we’ll implement Single, which is actually closer to First than Last is, in some ways:
public static TSource Single<TSource>(
this IEnumerable<TSource> source)
{
using (IEnumerator<TSource> iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
throw new InvalidOperationException(
"Sequence was empty");
}
TSource ret = iterator.Current;
if (iterator.MoveNext())
{
throw new InvalidOperationException(
"Sequence contained multiple elements");
}
return ret;
}
}
public static TSource Single<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
TSource ret = default(TSource);
bool foundAny = false;
foreach (TSource item in source)
{
if (predicate(item))
{
if (foundAny)
{
throw new InvalidOperationException("Sequence contained multiple matching elements");
}
foundAny = true;
ret = item;
}
}
if (!foundAny)
{
throw new InvalidOperationException("No items matched the predicate");
}
return ret;
}
This is already significantly more complex than First. The predicate-less version starts off the same way, but if we manage to move to the first element, we have to remember that value (as we’re hoping to return it) and then try to move to the second element. This time, if the move succeeds, we have to throw an exception – otherwise we can return our saved value.
The predicated version is even hairier. We still need to remember the first matching value we find, but this time we’re looping – so we need to keep track of whether we’ve already seen a matching value or not. If we see a second match, we have to throw an exception… and we also have to throw an exception if we reach the end without finding any matches at all. Note that although we assign an initial value of default(TSource) to ret, we’ll never reach a return statement without assigning a value to it. However, the rules around definite assignment aren’t smart enough to cope with this, so we need to provide a "dummy" value to start with… and default(TSource) is really the only value available. There is an alternative approach without using a foreach statement, where you loop until you find the first match, assign it to a local variable which is only declared at that point, followed by a second loop ensuring that there aren’t any other matches. I personally think that’s a bit more complex, which is why I’ve just used the foreach here.
The difference when we implement SingleOrDefault isn’t quite as pronounced this time though:
public static TSource SingleOrDefault<TSource>(
this IEnumerable<TSource> source)
{
using (IEnumerator<TSource> iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
return default(TSource);
}
TSource ret = iterator.Current;
if (iterator.MoveNext())
{
throw new InvalidOperationException(
"Sequence contained multiple elements");
}
return ret;
}
}
public static TSource SingleOrDefault<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
TSource ret = default(TSource);
bool foundAny = false;
foreach (TSource item in source)
{
if (predicate(item))
{
if (foundAny)
{
throw new InvalidOperationException("Sequence contained multiple matching elements");
}
foundAny = true;
ret = item;
}
}
return ret;
}
This time we’ve just replaced a "throw" statement in the predicate-less method with a "return" statement, and removed the test for no matches being found in the predicated method. Here our assignment of default(TSource) to ret really works in our favour – if we don’t end up assigning anything else to it, we’ve already got the right return value!
Next up is Last:
public static TSource Last<TSource>(
this IEnumerable<TSource> source)
{
IList<TSource> list = source
as IList<TSource>;
if (list !=
null)
{
if (list.Count == 0)
{
throw new InvalidOperationException(
"Sequence was empty");
}
return list[list.Count – 1];
}
using (IEnumerator<TSource> iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
throw new InvalidOperationException("Sequence was empty");
}
TSource last = iterator.Current;
while (iterator.MoveNext())
{
last = iterator.Current;
}
return last;
}
}
public static TSource Last<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
bool foundAny = false;
TSource last = default(TSource);
foreach (TSource item in source)
{
if (predicate(item))
{
foundAny = true;
last = item;
}
}
if (!foundAny)
{
throw new InvalidOperationException("No items matched the predicate");
}
return last;
}
Let’s start off with the optimization at the beginning of the predicate-less method. If we find out we’re dealing with a list, we can simply fetch the count, and then either throw an exception or return the element at the final index. As an extra bit of optimization I could store the count in a local variable – but I’m assuming that the count of an IList<T> is cheap to compute. If there are significant objections to that assumption, I’m happy to change it :) Note that this is another situation where I’m assuming that anything implementing IList<T> will only hold at most Int32.MaxValue items – otherwise the optimization will fail.
If we don’t follow the optimized path, we simply iterate over the sequence, updating a local variable with the last-known element on every single iteration. This time I’ve avoided the foreach loop for no particularly good reason – we could easily have had a foundAny variable which was just set to "true" on every iteration, and then tested at the end. In fact, that’s exactly the pattern the predicated method takes. Admittedly that decision is forced upon us to some extent – we can’t just move once and then take the first value as the "first last-known element", because it might not match the predicate.
There’s no optimization for the predicated form of Last. This follows LINQ to Objects, but I don’t honestly know the reason for it there. We could easily iterate backwards from the end of the sequence using the indexer on each iteration. One possible reason which makes a certain amount of sense is that when there’s a predicate, that predicate could throw an exception for some values – and if we just skipped to the end if the collection implements IList<T>, that would be an observable difference. I’d be interested to know whether or not that is the reason – if anyone has any inside information which they can share, I’ll update this post.
From here, we only have one more operator to implement – LastOrDefault:
public static TSource LastOrDefault<TSource>(
this IEnumerable<TSource> source)
{
IList<TSource> list = source
as IList<TSource>;
if (list !=
null)
{
return list.Count == 0 ?
default(TSource) : list[list.Count – 1];
}
TSource last = default(TSource);
foreach (TSource item in source)
{
last = item;
}
return last;
}
public static TSource LastOrDefault<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
TSource last = default(TSource);
foreach (TSource item in source)
{
if (predicate(item))
{
last = item;
}
}
return last;
}
This time, aside from the optimization, the predicated and non-predicated forms look very similar… more so than for any of the other operators. In each case, we start with a return value of default(TSource), and iterate over the whole sequence, updating it – only doing so when it matches the predicate, if we’ve got one.
Conclusion
This was a longer post than I anticipated when I got up this morning, but I hope the slight differences between implementations – and the mystery of the unoptimized predicated "Last/LastOrDefault" operators have made it worth slogging through.
As a contrast – and because I’ve already mentioned it in this post – I’ll implement DefaultIfEmpty next. I reckon I can still do that this evening, if I hurry…
Addendum
It turns out I was missing some tests for Single and SingleOrDefault: what should they do if evaluating the sequence fully throws an exception? It turns out that in LINQ to Objects, the overloads without a predicate throw InvalidOperationException as soon as they see a second element, but the overloads with a predicate keep iterating even when they’ve seen a second element matching a predicate. This seems ludicrously inconsistent to me – I’ve opened a Connect issue about it; we’ll see what happens.