Reimplementing LINQ to Objects: Part 3 – “Select” (and a rename…)

It’s been a long time since I wrote part 1 and part 2 of this blog series, but hopefully things will move a bit more quickly now.

The main step forward is that the project now has a source repository on Google Code instead of just being a zip file on each blog post. I had to give the project a title at that point, and I’ve chosen Edulinq, hopefully for obvious reasons. I’ve changed the namespaces etc in the code, and the blog tag for the series is now Edulinq too. Anyway, enough of the preamble… let’s get on with reimplementing LINQ, this time with the Select operator.

What is it?

Like Where, Select has two overloads:

public static IEnumerable<TResult> Select<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, TResult> selector)

public static IEnumerable<TResult> Select<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, int, TResult> selector)

Again, they both operate the same way – but the second overload allows the index into the sequence to be used as part of the projection.

Simple stuff first: the method projects one sequence to another: the "selector" delegate is applied to each input element in turn, to yield an output element. Behaviour notes, which are exactly the same as Where (to the extent that I cut and paste these from the previous blog post, and just tweaked them):

  • The input sequence is not modified in any way.
  • The method uses deferred execution – until you start trying to fetch items from the output sequence, it won’t start fetching items from the input sequence.
  • Despite deferred execution, it will validate that the parameters aren’t null immediately.
  • It streams its results: it only ever needs to look at one result at a time.
  • It will iterate over the input sequence exactly once each time you iterate over the output sequence.
  • The "selector" function is called exactly once per yielded value.
  • Disposing of an iterator over the output sequence will dispose of the corresponding iterator over the input sequence.

What are we going to test?

The tests are very much like those for Where – except that in cases where we tested the filtering aspect of Where, we’re now testing the projection aspect of Select.

There are a few tests of some interest. Firstly, you can tell that the method is generic with 2 type parameters instead of 1 – it has type parameters of TSource and TResult. They’re fairly self-explanatory, but it means it’s worth having a test for the case where the type arguments are different – such as converting an int to a string:

[Test]
public void SimpleProjectionToDifferentType()
{
    int[] source = { 1, 5, 2 };
    var result = source.Select(x => x.ToString());
    result.AssertSequenceEqual("1", "5", "2");
}

Secondly, I have a test that shows what sort of bizarre situations you can get into if you include side effects in your query. We could have done this with Where as well of course, but it’s clearer with Select:

[Test]
public void SideEffectsInProjection()
{
    int[] source = new int[3]; // Actual values won’t be relevant
    int count = 0;
    var query = source.Select(x => count++);
    query.AssertSequenceEqual(0, 1, 2);
    query.AssertSequenceEqual(3, 4, 5);
    count = 10;
    query.AssertSequenceEqual(10, 11, 12);
}

Notice how we’re only calling Select once, but the results of iterating over the results change each time – because the "count" variable has been captured, and is being modified within the projection. Please don’t do things like this.

Thirdly, we can now write query expressions which include both "select" and "where" clauses:

[Test]
public void WhereAndSelect()
{
    int[] source = { 1, 3, 4, 2, 8, 1 };
    var result = from x in source
                 where x < 4
                 select x * 2;
    result.AssertSequenceEqual(2, 6, 4, 2);
}

There’s nothing mind-blowing about any of this, of course – hopefully if you’ve used LINQ to Objects at all, this should all feel very comfortable and familiar.

Let’s implement it!

Surprise surprise, we go about implementing Select in much the same way as Where. Again, I simply copied the implementation file and tweaked it a little – the two methods really are that similar. In particular:

  • We’re using iterator blocks to make it easy to return sequences
  • The semantics of iterator blocks mean that we have to separate the argument validation from the real work. (Since I wrote the previous post, I’ve learned that VB11 will have anonymous iterators, which will avoid this problem. Sigh. It just feels wrong to envy VB users, but I’ll learn to live with it.)
  • We’re using foreach within the iterator blocks to make sure that we dispose of the input sequence iterator appropriately – so long as our output sequence iterator is disposed or we run out of input elements, of course.

I’ll skip straight to the code, as it’s all so similar to Where. It’s also not worth showing you the version with an index – because it’s such a trivial difference.

public static IEnumerable<TResult> Select<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, TResult> selector)
{
    if (source == null)
    {
        throw new ArgumentNullException("source");
    }
    if (selector == null)
    {
        throw new ArgumentNullException("selector");
    }
    return SelectImpl(source, selector);
}

private static IEnumerable<TResult> SelectImpl<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, TResult> selector)
{
    foreach (TSource item in source)
    {
        yield return selector(item);
    }
}

Simple, isn’t it? Again, the real "work" method is even shorter than the argument validation.

Conclusion

While I don’t generally like boring my readers (which may come as a surprise to some of you) this was a pretty humdrum post, I’ll admit. I’ve emphasized "just like Where" several times to the point of tedium very deliberately though – because it makes it abundantly clear that there aren’t really as many tricky bits to understand as you might expect.

Something slightly different next time (which I hope will be in the next few days). I’m not quite sure what yet, but there’s an awful lot of methods still to choose from…

9 thoughts on “Reimplementing LINQ to Objects: Part 3 – “Select” (and a rename…)”

  1. On your previous post on implementing Where you mentioned that you removed the using statement for Linq, so how does the compiler know what to do when you use the query syntax? I don’t tend to use query syntax so I don’t fully understand how it works – but it seems interesting to me that the compiler ‘knows’ to use your implementation for select and where

    Like

  2. @Mr_peeks If you try to write the query syntax without a using for Linq, you get a syntax error stating that the various extension methods used (Select, Where, etc.) couldn’t be found on the type. So I imagine as long as the type has the methods the compiler is looking for (regardless of where they come from), it will be able to wire them up without any issues.

    Like

  3. @Mr_peeks, the extension methods for Linq operators don’t have to be in System.Linq. The compiler doesn’t care where they are, as long as it finds a method with a suitable signature, either as an instance method or an extension method.

    @Jon, I felt exactly the same about VB11 anonymous iterators when I read about it… According to Eric Lippert, they don’t intend to do the same for C#. Too bad… Anyway, I think this the only VB feature I’d like to see in C# (and perhaps indexed properties, too)

    Like

  4. You should probably mention in the behaviour notes:

    – The selector method is called exactly once for each yielded value.

    That is, after all, what you’re testing with SideEffectsInProjection.

    Like

  5. Pretty straightforward. However, there’s a thing which should be changed IMHO: the unit test SimpleProjectionToDifferentType could potentially fail depending on the current culture of the OS.

    This leads to the question what the best practice about this issue is. I think that always using the invariant culture in unit tests (except for culture-specific tests of course where the coulture would be specified anyways) should be the way to go.

    What are your thoughts on that? You must have dealt with this question before in the context of JodaTime…

    Like

  6. @Lucero: That’s a good point, although I can’t think of any culture where single-digit (or even double-digit) integers are going to be represented differently. If we were using floating point types or large numbers, it would certainly be a different matter.

    The business of testing cultures in unit tests is a nasty one – and one which is made worse by some of the default choices taken by the framework, IMO.

    I ran into a similar problem when trying to rebuild JSON.NET, in fact – it had a check that some method converts from local time to UTC… and the check was that when given a local time at the start of January one year, the converted time should be different. That doesn’t help those of us in Europe/London :)

    I think I’ll leave the code as-is, but it was a good issue to raise.

    Like

  7. @Jon, that0s not entierly true. How digits are rendered is controlled by NumberFormat.DigitSubstitution (see http://msdn.microsoft.com/en-us/library/system.globalization.digitshapes.aspx for details on the substitution).

    foreach (CultureInfo culture in CultureInfo.GetCultures(CultureTypes.SpecificCultures)) {
    if (culture.NumberFormat.DigitSubstitution != DigitShapes.None) {
    Console.WriteLine(“{0}: {1}”, culture.EnglishName, culture.NumberFormat.DigitSubstitution);
    }
    }

    This will yield 15 specific cultures that use the native digits contextually (those shouldn’t be a problem, since they are converted to native only when they are in the context of local text), but there are 4 which are NativeNational:

    Dari (Afghanistan): NativeNational
    Pashto (Afghanistan): NativeNational
    Nepali (Nepal): NativeNational
    Khmer (Cambodia): NativeNational

    Those cultures should fail the unit test (didn’t try though). Not to mention someone creating a strange custom culture… ;-)

    Like

  8. @Lucero: Okay, I’m convinced. For the sake of brevity in tests I’ll implement an extra extension method “ToInvariantString”.

    Like

Leave a comment