Reimplementing LINQ to Objects: Part 12 – DefaultIfEmpty

After the masses of code required for all the permutations of First/Last/etc, DefaultIfEmpty is a bit of a relief.

What is it?

Even this simple operator has two overloads:

public static IEnumerable<TSource> DefaultIfEmpty<TSource>(
    this IEnumerable<TSource> source)

public static IEnumerable<TSource> DefaultIfEmpty<TSource>(
    this IEnumerable<TSource> source,
    TSource defaultValue)

The behaviour is very simple to describe:

  • If the input sequence is empty, the result sequence has a single element in it, the default value. This is default(TSource) for the overload without an extra parameter, or the specified value otherwise.
  • If the input sequence isn’t empty, the result sequence is the same as the input sequence
  • The source argument must not be null, and this is validated eagerly
  • The result sequence itself uses deferred execution – the input sequence isn’t read at all until the result sequence is read
  • The input sequence is streamed; any values read are yielded immediately; no buffering is used

Dead easy.

What are we going to test?

Despite being relatively late in the day, I decided to test for argument validation – and a good job too, as my first attempt failed to split the implementation into an argument validation method and an iterator block method for the real work! It just shows how easy it is to slip up.

Other than that, I can really only see four cases worth testing:

  • No default value specified, empty input sequence
  • Default value specified, empty input sequence
  • No default value specified, non-empty input sequence
  • Default value specified, non-empty input sequence

So I have tests for all of those, and that’s it. I don’t have anything testing for streaming, lazy evaluation etc.

Let’s implement it!

Despite my reluctance to implement one operator in terms of another elsewhere, this felt like such an obvious case, I figured it would make sense just this once. I even applied DRY to the argument validation aspect. Here’s the implementation in all its brief glory:

public static IEnumerable<TSource> DefaultIfEmpty<TSource>(
    this IEnumerable<TSource> source)
{
    // This will perform an appropriate test for source being null first.
    return source.DefaultIfEmpty(default(TSource));
}

public static IEnumerable<TSource> DefaultIfEmpty<TSource>(
    this IEnumerable<TSource> source,
    TSource defaultValue)
{
    if (source == null)
    {
        throw new ArgumentNullException("source");
    }
    return DefaultIfEmptyImpl(source, defaultValue);
}

private static IEnumerable<TSource> DefaultIfEmptyImpl<TSource>(
    IEnumerable<TSource> source,
    TSource defaultValue)
{
    bool foundAny = false;
    foreach (TSource item in source)
    {
        yield return item;
        foundAny = true;
    }
    if (!foundAny)
    {
        yield return defaultValue;
    }
}

Of course, now that I’ve said how easy it was, someone will find a bug…

Aside from the use of default(TSource) to call the more complex overload from the simpler one, the only aspect of any interest is the bottom method. It irks me slightly that we’re assigning "true" to "foundAny" on every iteration… but the alternative is fairly unpleasant:

private static IEnumerable<TSource> DefaultIfEmptyImpl<TSource>(
    IEnumerable<TSource> source,
    TSource defaultValue)
{
    using (IEnumerator<TSource> iterator = source.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
            yield return defaultValue;
            yield break; // Like a "return"
        }
        yield return iterator.Current;
        while (iterator.MoveNext())
        {
            yield return iterator.Current;
        }
    }
}

This may be slightly more efficient, but it feels a little clumsier. We could get rid of the "yield break" by putting the remainder of the method in an "else" block, but I’m not dead keen on that, either. We could use a do/while loop instead of a simple while loop – that would at least remove the repetition of "yield return iterator.Current" but I’m not really a fan of do/while loops. I use them sufficiently rarely that they cause me more mental effort to read than I really like.

If any readers have suggestions which are significantly nicer than either of the above implementations, I’d be interested to hear them. This feels a little inelegant at the moment. It’s far from a readability disaster – it’s just not quite neat.

Conclusion

Aside from the slight annoyance at the final lack of elegance, there’s not much of interest here. However, we could now implement FirstOrDefault/LastOrDefault/SingleOrDefault using DefaultIfEmpty. For example, here’s an implementation of FirstOrDefault:

public static TSource FirstOrDefault<TSource>(
    this IEnumerable<TSource> source)
{
    return source.DefaultIfEmpty().First();
}

public static TSource FirstOrDefault<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, bool> predicate)
{
    // Can’t just use source.DefaultIfEmpty().First(predicate)
    return source.Where(predicate).DefaultIfEmpty().First();
}

Note the comment in the predicated version – the defaulting has to be the very last step after we’ve applied the predicate… otherwise if we pass in an empty sequence and a predicate which doesn’t match with default(TSource), we’ll get an exception instead of the default value. The other two …OrDefault operators could be implemented in the same way, of course. (I haven’t done so yet, but the above code is in source control.)

I’m currently unsure what I’ll implement next. I’ll see whether I get inspired by any particular method in the morning.

9 thoughts on “Reimplementing LINQ to Objects: Part 12 – DefaultIfEmpty”

  1. When you said “Despite my reluctance to implement one operator in terms of another elsewhere, this felt like such an obvious case, I figured it would make sense just this once.” I thought you were going to do something like this (which I think is a little cleaner):

    private static IEnumerable DefaultIfEmptyImpl(
    IEnumerable source,
    TSource defaultValue)
    {
    if (source.Any())
    {
    foreach (TSource item in source)
    {
    yield return item;
    }
    }
    else
    {
    yield return defaultValue;
    }
    }

    Like

  2. C# really needs an operator like F#’s yield! (“yield bang”), which could be used on an IEnumerable or IEnumerator. The keyword could be something like “yield all”. It would make the implementation much cleaner…

    using (IEnumerator iterator = source.GetEnumerator())
    {
    if (iterator.MoveNext())
    {
    yield return iterator.Current;
    yield all iterator;
    }
    else
    {
    yield return defaultValue;
    }
    }

    Like

  3. @Michael: That ends up fetching the iterator for source twice, if it’s non-empty. I think that’s a bad idea – I treat it as a general principle to only read from a sequence once.

    Like

  4. How about:

    using (IEnumerator iterator = source.GetEnumerator())
    {
    if (iterator.MoveNext())
    {
    do
    {
    yield return iterator.Current;
    }
    while (iterator.MoveNext());
    }
    else
    {
    yield return defaultValue;
    }
    }

    Seems pretty clean to me…

    Like

  5. Jon, you could attempt to fetch the first item in the sequence, then compare and determine, whether to continue to yield or return the default value (I’m off to work right now, but I can definitely see an issue with struct vs. class types – will check this later, but it’s just an idea).

    Personally, I don’t find the second alternative clumsy. Reassigning for each iteration feels clumsy imo.

    Btw. would be really interesting if you could continue the LINQ series with additional helper extensions (I’ll come up with a list later), but there are daily requirements which the default BCL LINQ extensions doesn’t support in a single method. Again, more later.

    Like

  6. @John: I mention something similar in the post… I’m not keen on do/while in general. Possibly just a personal habit.

    @Anders: If MoveNext() returns false, it would be a bug to call Current *at all*. As for continuing the series afterwards – maybe. I’ve got some extensions at morelinq.googlecode.com already. I suspect everyone will be sick of LINQ by the time I’m finished :)

    Like

  7. How about spilting into to methods on that takes care of a potential empty sequence and one that “converts
    ” an IEnumerator to IEnumerable. The latter could be an extension method in it’s own rights since it’s applicable in more cases than this (However in the below implementation the contract is not validated but simply relied upon. Ie. that there has to be a valid .Current when entering the method):

    private static IEnumerable DefaultIfEmptyImpl(
    IEnumerable source,
    TSource defaultValue)
    {
    using (IEnumerator iterator = source.GetEnumerator())
    {
    if (!iterator.MoveNext())
    {
    return new [] {defaultValue};
    }
    return iterator.Rest();
    }
    }

    public static IEnumerable Rest(this IEnumerator iterator)
    {
    yield return iterator.Current;
    while (iterator.MoveNext())
    {
    yield return iterator.Current;
    }
    }

    Like

  8. @runefs: Nope, that wouldn’t work – because then it’s calling GetEnumerator and MoveNext eagerly… you’re breaking deferred execution.

    Like

Leave a comment