Reimplementing LINQ to Objects: Part 1 – Introduction

About a year and a half ago, I gave a talk at a DDD day in Reading, attempting to reimplement as much of LINQ to Objects as possible in an hour. Based on the feedback from the session, I went far too fast… and I was still a long way from finishing. However, I still think it’s an interesting exercise, so I thought I’d do it again in a more leisurely way, blogging as I go. Everything will be under the "Edulinq" tag, so that’s the best way to get all the parts in order, without any of my other posts.

General approach

The plan is to reimplement the whole of LINQ to Objects, explaining each method (or group of methods) in a blog post. I’m going to try to make the code itself production quality, but I’m not going to include any XML documentation – if I’m already writing up how things work, I don’t want to do everything twice. I’ll include optimizations where appropriate, hopefully doing better than LINQ to Objects itself.

The approach is going to be fairly simple: for each LINQ method, I’ll write some unit tests (most of which I won’t show in the blog posts) and make sure they run against the normal .NET implementation. I’ll then comment out the using directive for System.Linq, and introduce one for JonSkeet.Linq instead. The tests will fail, I’ll implement the methods, and gradually they’ll go green again. It’s not quite the normal TDD pattern, but it works pretty well.

I will write up a blog entry for each LINQ operator, probably including all of the production code but only "interesting" tests. I’ll highlight important patterns as they come up – that’s half the point of the exercise, of course.

At the end of each post, I’ll include a link to download the "code so far". For the sake of anyone looking at these posts in the future, I’m going to keep the downloads numbered separately, rather than updating a single download over time. I’m hoping that the code will simply grow, but I dare say there’ll be some modifications along the way too.

The aim is not to end up with LINQBridge: I’m going to be targeting .NET 3.5 (mostly so I can use extension methods without creating my own attribute) and I’m certainly not going to start worrying about installers and the like. The purpose of all of this is purely educational: if you follow along with these blog posts, with any luck you’ll have a deeper understanding of LINQ in general and LINQ to Objects in particular. For example, topics like deferred execution are often misunderstood: looking at the implementation can clarify things pretty well.

Testing

The unit tests will be written using NUnit (just for the sake of my own familiarity). Fairly obviously, one of the things we’ll need to test quite a lot is whether two sequences are equal. We’ll do this using the TestExtensions class from MoreLINQ (which I’ve just copied into the project). The netbook on which I’ll probably write most of this code only has C# Express 2010 on it, so I’m going to use the external NUnit GUI. I’ve set this in the project file as the program to start… which you can’t do from C# Express directly, but editing the project file is easy, to include this:

<StartAction>Program</StartAction>
<StartProgram>C:Program FilesNUnit-2.5.7.10213binnet-2.0nunit-x86.exe</StartProgram>

It’s a bit of a grotty hack, but it works. The "additional command line parameters" are then set to just JonSkeet.Linq.Tests.dll – the current directory is the bin/debug directory by default, so everything’s fine. Obviously if you want to run the tests yourself and you have ReSharper or something similar, you can see the results integrated into Visual Studio.

Although I’m hoping to write reasonably production-level code, I doubt that there’ll be as many unit tests as I’d really write against production code. I fully expect the number of lines of test code to dwarf the number of lines of production code even so. There are simply vast numbers of potential corner cases… and quite a few overloads in several cases. Remember that the goal here is to examine interesting facets of LINQ.

Code layout

Just like the real LINQ to Objects, I’ll be creating an enormous static Enumerable class… but I’ll do so using partial classes, with one method name (but multiple overloads) per file. So Where will be implemented in Where.cs and tested in WhereTest.cs, for example.

First code drop

The first zip file is available: Linq-To-Objects-1.zip. It doesn’t contain any production code yet – just 4 tests for Where, so I could check that NUnit was working properly for me. Next stop… implementing Where.

27 thoughts on “Reimplementing LINQ to Objects: Part 1 – Introduction”

  1. John, just wondering that instead of handing out the code drops via zip files, that you might consider putting the code into source control somewhere?

    Like

  2. @Stu: Will certainly consider it. I think it’s nice to have the zip files as well, as easy-to-grab snapshots whatever source control clients you happen to have installed.

    Like

  3. @Jef: I’m aware there’s a CAPTCHA problem on Chrome. Unfortunately as I don’t run the blog server, I can’t do anything about it :(

    Like

  4. Do you plan on hosting the code on a repository somewhere?

    I plan on hosting a version of it, with XML documentation that I will add, I can easily host a mercurial repo alongside it with all your code-drops until you publish your own.

    Like

  5. Looking forward this, I learnt a lot about LINQ from looking through the code from your DDD talk and watching the video where you had attendees standing up the front with bits of paper!!

    Like

  6. “Fairly obviously, one of the things we’ll need to test quite a lot is whether two sequences are equal. We’ll do this using the TestExtensions class from MoreLINQ (which I’ve just copied into the project).”

    Perhaps I’m missing something, but why not simply use NUnit’s built-in Collection asserts? They work with IEnumerable sequences.

    http://nunit.com/index.php?p=collectionAssert&r=2.5.7

    Like

  7. @Chris: It’s possible that they didn’t exist when I first looked at MoreLINQ… it’s also very likely I just didn’t know about them :)

    Will look tonight and possibly rip out the TestExtensions bit…

    Like

  8. Great idea, Jon!

    Please consider hosting this project on Github or the like. I would love to be able to follow along without having to download, unzip, and merge.

    No pressure. ;)

    Like

  9. Jon, how are you going to ensure that performance characteristics of your implementation (once you get to the really interesting stuff such as grouping and joins) are on par with the stock implementation?

    Like

  10. @Pavel: Not sure, to be honest. Will have to think about that. I’ll certainly be talking about performance and reasoning about it… but testing could be tricky.

    Like

  11. uh … wow. Reimplementing link as a leisure activity. Wow! If it was anybody but you (Jon Skeet) .. I’d think it a little ‘nuts’ … yet it will be deeply interesting as a teaching and understanding exercise … I’m looking forward to it … seems like you have a little too much time on you hands .. there’s got to be another CSharp focused book you’d like to work on? What about a Windows Phone 7 book ;-) ???

    Like

  12. Jon .. my previous ‘time on your hands’ comment was too harsh … I’d be very interested in ANY programming book you might choose to write and I know I’ll learn a great deal from these upcoming blog posts.

    Like

  13. If anybody’s wondering what this ThrowingEnumerable class is, a google search came up with the below from morelinq google code: enjoy :)

    namespace Edulinq.TestSupport
    {
    ///
    /// Class to help for deferred execution tests: it throw an exception
    /// if GetEnumerator is called.
    ///
    public sealed class ThrowingEnumerable : IEnumerable
    {
    public IEnumerator GetEnumerator()
    {
    throw new InvalidOperationException();
    }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    
        /// <summary>
        /// Check that the given function uses deferred execution.
        /// A "spiked" source is given to the function: the function
        /// call itself shouldn't throw an exception. However, using
        /// the result (by calling GetEnumerator() then MoveNext() on it) *should*
        /// throw InvalidOperationException.
        /// </summary>
        public static void AssertDeferred<T>(
            Func<IEnumerable<int>, IEnumerable<T>> deferredFunction)
        {
            ThrowingEnumerable source = new ThrowingEnumerable();
            var result = deferredFunction(source);
            using (var iterator = result.GetEnumerator())
            {
                Assert.Throws<InvalidOperationException>(() => iterator.MoveNext());
            }
        }
    }
    

    Like

Leave a comment