Reimplementing LINQ to Objects: Part 4 – Range

This will be a short post, and there’ll probably be some more short ones coming up too. I think it makes sense to only cover multiple operators in a single post where they’re really similar. (Count and LongCount spring to mind.) I’m in your hands though – if you would prefer "chunkier" posts, please say so in the comments.

This post will deal with the Range generation operator.

What is it?

Range only has a single signature:

public static IEnumerable<int> Range(
    int start,
    int count)

Unlike most of LINQ, this isn’t an extension method – it’s a plain old static method. It returns an iterable object which will yield "count" integers, starting from "start" and incrementing each time – so a call to Enumerable.Range(6, 3) would yield 6, then 7, then 8.

As it doesn’t operate on an input sequence, there’s no sense in which it could stream or buffer its input, but:

  • The arguments need to be validated eagerly; the count can’t be negative, and it can’t be such that any element of the range could overflow Int32.
  • The values will be yielded lazily – Range should be cheap, rather than creating (say) an array of "count" elements and returning that.

How are we going to test it?

Testing a plain static method brings us a new challenge in terms of switching between the "normal" LINQ implementation and the Edulinq one. This is an artefact of the namespaces I’m using – the tests are in Edulinq.Tests, and the implementation is in Edulinq. "Parent" namespaces are always considered when the compiler tries to find a type, and they take priority over anything in using directives – even a using directive which tries to explicitly alias a type name.

The (slightly ugly) solution to this that I’ve chosen is to include a using directive to create an alias which couldn’t otherwise be resolved – in this case, RangeClass. The using directive will either alias RangeClass to System.Linq.Enumerable or Edulinq.Enumerable. The tests then all use RangeClass.Range. I’ve also changed how I’m switching between implementations – I now have two project configurations, one of which defines the NORMAL_LINQ preprocessor symbol, and the other of which doesn’t. The RangeTest class therefore contains:

#if NORMAL_LINQ
using RangeClass = System.Linq.Enumerable;
#else
using RangeClass = Edulinq.Enumerable;
#endif

There are alternatives to this approach, of course:

  • I could move the tests to a different namespace
  • I could make the project references depend on the configuration… so the "Normal LINQ" configuration wouldn’t reference the Edulinq implementation project, and the "Edulinq implementation" configuration wouldn’t reference System.Core. I could then just use Enumerable.Range with an appropriate using directive for System.Linq conditional on the NORMAL_LINQ preprocessor directive, as per the other tests.

I like the idea of the second approach, but it means manually tinkering with the test project file – Visual Studio doesn’t expose any way of conditionally including a reference. I may do this at a later date… thoughts welcome.

What are we going to test?

There isn’t much we can really test for ranges – I only have eight tests, none of which are particularly exciting:

  • A simple valid range should look right when tested with AssertSequenceEqual
  • The start value should be allowed to be negative
  • Range(Int32.MinValue, 0) is an empty range
  • Range(Int32.MaxValue, 1) yields just Int32.MaxValue
  • The count can’t be negative
  • The count can be zero
  • start+count-1 can’t exceed Int32.MaxValue (so Range(Int32.MaxValue, 2) isn’t valid)
  • start+count-1 can be Int32.MaxValue (so Range(Int32.MaxValue, 1) is valid)

The last two are tested with a few different examples each – a large start and a small count, a small start and a large count, and "fairly large" values for both start and count.

Note that I don’t have any tests for lazy evaluation – while I could test that the returned value doesn’t implement any of the other collection interfaces, it would be a little odd to do so. On the other hand, we do have tests which have an enormous count – such that anything which really tried to allocate a collection of that size would almost certainly fail…

Let’s implement it!

It will surely be no surprise by now that we’re going to use a split implementation, with a public method which performs argument validation eagerly and then uses a private method with an iterator block to perform the actual iteration.

Having validated the arguments, we know that we’ll never overflow the bounds of Int32, so we can be pretty casual in the main part of the implementation.

public static IEnumerable<int> Range(int start, int count)
{
    if (count < 0)
    {
        throw new ArgumentOutOfRangeException("count");
    }
    // Convert everything to long to avoid overflows. There are other ways of checking
    // for overflow, but this way make the code correct in the most obvious way.
    if ((long)start + (long)count – 1L > int.MaxValue)
    {
        throw new ArgumentOutOfRangeException("count");
    }
    return RangeImpl(start, count);
}

private static IEnumerable<int> RangeImpl(int start, int count)
{
    for (int i = 0; i < count; i++)
    {
        yield return start + i;
    }
}

Just a few points to note here:

  • Arguably it’s the combination of "start" and "count" which is invalid in the second check, rather than just count. It would possibly be nice to allow ArgumentOutOfRangeException (or ArgumentException in general) to blame multiple arguments rather than just one. However, using "count" here matches the framework implementation.
  • There are other ways of performing the second check, and I certainly didn’t have to make all the operands in the expression longs. However, I think this is the simplest code which is clearly correct based on the documentation. I don’t need to think about all kinds of different situations and check that they all work. The arithmetic will clearly be valid when using the Int64 range of values, so I don’t need to worry about overflow, and I don’t need to consider whether to use a checked or unchecked context.
  • There are also other ways of looping in the private iterator block method, but I think this is the simplest. Another obvious and easy alternative is to keep two values, one for the count of yielded values and the other for the next value to yield, and increment them both on each iteration. A more complex approach would be to use just one loop variable – but you can’t use "value < start + count" in case the final value is exactly Int32.MaxValue, and you can’t use "value <= start + count – 1" in case the arguments are (int.MinValue, 0). Rather than consider all the border cases, I’ve gone for an obviously-correct solution. If you really, really cared about the performance of Range, you’d want to investigate various other options.

Prior to writing up this post, I didn’t have good tests for Range(Int32.MaxValue, 1) and Range(Int32.MinValue, 0)… but as they could easily go wrong as mentioned above, I’ve now included them. I find it interesting how considering alternative implementations suggests extra tests.

Conclusion

"Range" was a useful method to implement in order to test some other operators – "Count" in particular. Now that I’ve started on the non-extension methods though, I might as well do the other two (Empty and Repeat). I’ve already implemented "Empty", and will hopefully be able to write it up today. "Repeat" shouldn’t take much longer, and then we can move on to "Count" and "LongCount".

I think this code is a good example of situations where it’s worth writing "dumb" code which looks like the documentation, rather than trying to write possibly shorter, possibly slightly more efficient code which is harder to think about. No doubt there’ll be more of that in later posts…

12 thoughts on “Reimplementing LINQ to Objects: Part 4 – Range”

  1. Just want to mention that you’ve already calculated the value of (start + count – 1) for a precondition; I don’t see a reason not to keep it in a local and use it (casting to int after validating it’s <= int.MaxValue)

    Like

  2. @Mihailik

    Short-circuit by providing an extra condition check for a case that should be very unlikely? Especially when the idea is for the implementations to defer until required? I’d argue against doing so.

    Like

  3. You’re right, that does complicate things. Instead of just casting the end value to int and moving on you’d have to branch here.

    After validation (where you’ve got
    long end = (long)start + (long)count – 1L > int.MaxValue;
    ):
    if (count == 0) return Empty();
    else return RangeImpl(start, end);

    This is only if you want to optimize this method to be a simple
    for (int i = start; i <= end; i++) yield return i;
    which would probably run faster but be less readable. This kind of trade-off is kind of annoying but unavoidable; I think in our case your implementation is better (for readability) but if implementing a 'serious' linq I'd use the optimized version.

    Then again, if implementing a serious linq it would probably be better to unroll the enumerable magic and create an enumerator class ourselves, so I'm not sure if we'd even have this problem…

    Like

  4. I think the “one variable” loop you mentioned may be:

    int value = start – 1;
    while (value != end) yield return ++value;

    Like

  5. @Ben: That’s feasible… if you turn off overflow checking to start with. I think it still requires a little more thought to ensure that the corner cases work than is ideal – but it would probably be slightly more efficient than my current code.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s