Reimplementing LINQ to Objects: Part 5 – Empty

Continuing with the non-extension methods, it’s time for possibly the simplest LINQ operator around: "Empty".

What is it?

"Empty" is a generic, static method with just a single signature and no parameters:

public static IEnumerable<TResult> Empty<TResult>()

It returns an empty sequence of the appropriate type. That’s all it does.

There’s only one bit of interesting behaviour: Empty is documented to cache an empty sequence. In other words, it returns a reference to the same empty sequence every time you call it (for the same type argument, of course).

What are we going to test?

There are really only two things we can test here:

  • The returned sequence is empty
  • The returned sequence is cached on a per type argument basis

I’m using the same approach as for Range to call the static method, but this time with an alias of EmptyClass. Here are the tests:

[Test]
public void EmptyContainsNoElements()
{
    using (var empty = EmptyClass.Empty<int>().GetEnumerator())
    {
        Assert.IsFalse(empty.MoveNext());
    }
}

[Test]
public void EmptyIsASingletonPerElementType()
{
    Assert.AreSame(EmptyClass.Empty<int>(), EmptyClass.Empty<int>());
    Assert.AreSame(EmptyClass.Empty<long>(), EmptyClass.Empty<long>());
    Assert.AreSame(EmptyClass.Empty<string>(), EmptyClass.Empty<string>());
    Assert.AreSame(EmptyClass.Empty<object>(), EmptyClass.Empty<object>());

    Assert.AreNotSame(EmptyClass.Empty<long>(), EmptyClass.Empty<int>());
    Assert.AreNotSame(EmptyClass.Empty<string>(), EmptyClass.Empty<object>());
}

Of course, that doesn’t verify that the cache isn’t per-thread, or something like that… but it’ll do.

Let’s implement it!

The implementation is actually slightly more interesting than the description so far may suggest. If it weren’t for the caching aspect, we could just implement it like this:

// Doesn’t cache the empty sequence
public static IEnumerable<TResult> Empty<TResult>()
{
    yield break;
}

… but we want to obey the (somewhat vaguely) documented caching aspect too. It’s not really hard, in the end. There’s a very handy fact that we can use: empty arrays are immutable. Arrays always have a fixed size, but normally there’s no way of making an array read-only… you can always change the value of any element. But an empty array doesn’t have any elements, so there’s nothing to change. So, we can reuse the same array over and over again, returning it directly to the caller… but only if we have an empty array of the right type.

At this point you may be expecting a Dictionary<Type, Array> or something similar… but there’s another useful trick we can take advantage of. If you need a per-type cache and the type is being specific as a type argument, you can use static variables in a generic class, because each constructed type will have a distinct set of static variables.

Unfortunately, Empty is a generic method rather than a non-generic method in a generic type… so we’ve got to create a separate generic type to act as our cache for the empty array. That’s easy to do though, and the CLR takes care of initializing the type in a thread-safe way, too. So our final implementation looks like this:

public static IEnumerable<TResult> Empty<TResult>()
{
    return EmptyHolder<TResult>.Array;
}
        
private static class EmptyHolder<T>
{
    internal static readonly T[] Array = new T[0];       
}

That obeys all the caching we need, and is really simple in terms of lines of code… but it does mean you need to understand how generics work in .NET reasonably well. In some ways this is the opposite of the situation in the previous post – this is a sneaky implementation instead of the slower but arguably simpler dictionary-based one. In this case, I’m happy with the trade-off, because once you do understand how generic types and static variables work, this is simple code. It’s a case where simplicity is in the eye of the beholder.

Conclusion

So, that’s Empty. The next operator – Repeat – is likely to be even simpler, although it’ll have to be another split implementation…

Addendum

Due to the minor revolt over returning an array (which I still think is fine), here’s an alternative implementation:

public static IEnumerable<TResult> Empty<TResult>()
{
    return EmptyEnumerable<TResult>.Instance;
}

#if AVOID_RETURNING_ARRAYS
private class EmptyEnumerable<T> : IEnumerable<T>, IEnumerator<T>
{
    internal static IEnumerable<T> Instance = new EmptyEnumerable<T>();

    // Prevent construction elsewhere
    private EmptyEnumerable()
    {
    }

    public IEnumerator<T> GetEnumerator()
    {
        return this;
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return this;
    }

    public T Current
    {
        get { throw new InvalidOperationException(); }
    }

    object IEnumerator.Current
    {
        get { throw new InvalidOperationException(); }
    }

    public void Dispose()
    {
        // No-op
    }

    public bool MoveNext()
    {
        return false// There’s never a next entry
    }

    public void Reset()
    {
        // No-op
    }
}

#else
private static class EmptyEnumerable<T>
{
    internal static readonly T[] Instance = new T[0];       
}
#endif

Hopefully now everyone can build a version they’re happy with :)

14 thoughts on “Reimplementing LINQ to Objects: Part 5 – Empty”

  1. The array method is not so great: People will incorrectly depend on the return value being an array although this is not documented. That is why linq queries of the form “from x in y select x” will still do a .Select(x=>x) in order to hide the original sequence. Handing out the original sequence would make it easy to take a false dependency.

    Like

  2. @tobi: I find it hard to imagine where people would really start depending on that. What could they actually *do* with an empty array which would depend on it being an array?

    Note that the “real” .NET implementation uses an empty array too.

    Like

  3. What testing framework do you use? EmptyIsASingletonPerElementType looks like a good candidate for a generic [RowTest].

    Like

  4. There could be a method

    T[] AsArray(this IList list)
    {
    return list is T[] ? list as T[] : list.ToArray();
    }

    and the method could have a bug:
    T[] AsArray(this IList list)
    {
    return list is T[] ? Bug() : list.ToArray();
    }

    and what if it is in a third-party lib? I am sure that _someone_ in the huge .net world has written this exact code. All of windows’ compatibility problems result from such trivial mistakes.

    Like

  5. @Tobi: So you’re saying that if a method has a bug, there’s a bug? Yes, that’s absolutely right – but I don’t see what this has to do with Enumerable.Empty.

    Returning an array is a perfectly valid implementation, IMO. (And as I say, it’s what’s in .NET itself.)

    Like

  6. The point is that people will start depending on undocumented properties and when your implementation changes they will break. This is not only their problem but yours too, because they will think that it is your mistake.
    If your point was true then why is microsoft doing all that compatibility work? They do that because people will think microsoft is to blame when a buggy app breaks. The same happens when users install a new version of your lib and suddenly their app stops working.

    Like

  7. @tobi: So whenever a method is declared to return IEnumerable[T], do you believe it should *never* return an instance of a concrete class? How about IList[T]? Yes, people shouldn’t take dependencies on the implementation – but that doesn’t mean it’s wrong to return *some* public class which implements an interface.

    Out of interest, have you logged a Connect request suggesting that MS shouldn’t return an array from Empty in their implementation?

    Like

  8. Who can say “never” and really mean it? In my own projects I do that all the time because I am willing to risk it (it is after all a very small risk). However if you are microsoft and have 3M devs use your product, you do not want to risk it. I believe that this is what their internal guidelines actually say because you find this pattern all over the BCL (there is even a TrueReadOnlyCollection which does the same thing but is inaccessible). I also believe the current Empty() impl. to be a mistake but propably the pattern was too elegant _not_ to implement it ;-)
    This is also the reason why many old BCL methods take only Arrays and not lists or enumerables because taking interfaces means that the BCL method will execute your code in the middle of it (reentrancy problems). There is a whole science to avoiding compatibility problems (read raymond chens blog to see on what absurd properties people depend in real-life).
    No, I did not log a Connect request.

    Like

  9. @tobi: I think we’ll have to agree to differ. Don’t forget that an empty array *is* truly immutable to start with – so it’s not like anything can interfere with the state of it.

    I’ll add an extra note to the post with an EmptyEnumerable implementation though…

    Like

  10. Hi there, would you be able to elaborate on the following line:

    “Empty is a generic method rather than a non-generic method in a generic type”

    chrs

    Like

    1. Well, you have to call it as Enumerable.Empty<int>() for example, whereas if it had been a non-generic method in a generic type, you might have called Collections<int>.Empty() for example.

      Like

Leave a comment