Reimplementing LINQ to Objects: Part 1 – Introduction

September 3, 2010 jonskeet 27 Comments

About a year and a half ago, I gave a talk at a DDD day in Reading, attempting to reimplement as much of LINQ to Objects as possible in an hour. Based on the feedback from the session, I went far too fast… and I was still a long way from finishing. However, I still think it’s an interesting exercise, so I thought I’d do it again in a more leisurely way, blogging as I go. Everything will be under the "Edulinq" tag, so that’s the best way to get all the parts in order, without any of my other posts.

General approach

The plan is to reimplement the whole of LINQ to Objects, explaining each method (or group of methods) in a blog post. I’m going to try to make the code itself production quality, but I’m not going to include any XML documentation – if I’m already writing up how things work, I don’t want to do everything twice. I’ll include optimizations where appropriate, hopefully doing better than LINQ to Objects itself.

The approach is going to be fairly simple: for each LINQ method, I’ll write some unit tests (most of which I won’t show in the blog posts) and make sure they run against the normal .NET implementation. I’ll then comment out the using directive for System.Linq, and introduce one for JonSkeet.Linq instead. The tests will fail, I’ll implement the methods, and gradually they’ll go green again. It’s not quite the normal TDD pattern, but it works pretty well.

I will write up a blog entry for each LINQ operator, probably including all of the production code but only "interesting" tests. I’ll highlight important patterns as they come up – that’s half the point of the exercise, of course.

At the end of each post, I’ll include a link to download the "code so far". For the sake of anyone looking at these posts in the future, I’m going to keep the downloads numbered separately, rather than updating a single download over time. I’m hoping that the code will simply grow, but I dare say there’ll be some modifications along the way too.

The aim is not to end up with LINQBridge: I’m going to be targeting .NET 3.5 (mostly so I can use extension methods without creating my own attribute) and I’m certainly not going to start worrying about installers and the like. The purpose of all of this is purely educational: if you follow along with these blog posts, with any luck you’ll have a deeper understanding of LINQ in general and LINQ to Objects in particular. For example, topics like deferred execution are often misunderstood: looking at the implementation can clarify things pretty well.

Testing

The unit tests will be written using NUnit (just for the sake of my own familiarity). Fairly obviously, one of the things we’ll need to test quite a lot is whether two sequences are equal. We’ll do this using the TestExtensions class from MoreLINQ (which I’ve just copied into the project). The netbook on which I’ll probably write most of this code only has C# Express 2010 on it, so I’m going to use the external NUnit GUI. I’ve set this in the project file as the program to start… which you can’t do from C# Express directly, but editing the project file is easy, to include this:

<StartAction>Program</StartAction>
<StartProgram>C:Program FilesNUnit-2.5.7.10213binnet-2.0nunit-x86.exe</StartProgram>

It’s a bit of a grotty hack, but it works. The "additional command line parameters" are then set to just JonSkeet.Linq.Tests.dll – the current directory is the bin/debug directory by default, so everything’s fine. Obviously if you want to run the tests yourself and you have ReSharper or something similar, you can see the results integrated into Visual Studio.

Although I’m hoping to write reasonably production-level code, I doubt that there’ll be as many unit tests as I’d really write against production code. I fully expect the number of lines of test code to dwarf the number of lines of production code even so. There are simply vast numbers of potential corner cases… and quite a few overloads in several cases. Remember that the goal here is to examine interesting facets of LINQ.

Code layout

Just like the real LINQ to Objects, I’ll be creating an enormous static Enumerable class… but I’ll do so using partial classes, with one method name (but multiple overloads) per file. So Where will be implemented in Where.cs and tested in WhereTest.cs, for example.

First code drop

The first zip file is available: Linq-To-Objects-1.zip. It doesn’t contain any production code yet – just 4 tests for Where, so I could check that NUnit was working properly for me. Next stop… implementing Where.

27 thoughts on “Reimplementing LINQ to Objects: Part 1 – Introduction”

Stu says:

September 3, 2010 at 2:02 am

John, just wondering that instead of handing out the code drops via zip files, that you might consider putting the code into source control somewhere?

LikeLike

Reply
skeet says:

September 3, 2010 at 2:14 am

@Stu: Will certainly consider it. I think it’s nice to have the zip files as well, as easy-to-grab snapshots whatever source control clients you happen to have installed.

LikeLike

Reply
tim says:

September 3, 2010 at 2:53 am

holy crap, that’s a task :)

LikeLike

Reply
Thomas Levesque says:

September 3, 2010 at 2:56 am

Cool idea Jon, I will be following this series with great interest ! I look forward to seeing how you implement Join or OrderBy…

LikeLike

Reply
Jef Claes says:

September 3, 2010 at 2:56 am

This is going to be good stuff.

Btw your captcha doesn’t show.

LikeLike

Reply
skeet says:

September 3, 2010 at 3:06 am

@Jef: I’m aware there’s a CAPTCHA problem on Chrome. Unfortunately as I don’t run the blog server, I can’t do anything about it :(

LikeLike

Reply
Lasse Vågsæther Karlsen says:

September 3, 2010 at 4:11 am

Do you plan on hosting the code on a repository somewhere?

I plan on hosting a version of it, with XML documentation that I will add, I can easily host a mercurial repo alongside it with all your code-drops until you publish your own.

LikeLike

Reply
skeet says:

September 3, 2010 at 4:18 am

@Lasse: I’ll think about it – not doing it right now, but maybe in the future.

LikeLike

Reply
Matt Warren says:

September 3, 2010 at 4:53 am

Looking forward this, I learnt a lot about LINQ from looking through the code from your DDD talk and watching the video where you had attendees standing up the front with bits of paper!!

LikeLike

Reply
Chris Nahr says:

September 3, 2010 at 7:01 am

“Fairly obviously, one of the things we’ll need to test quite a lot is whether two sequences are equal. We’ll do this using the TestExtensions class from MoreLINQ (which I’ve just copied into the project).”

Perhaps I’m missing something, but why not simply use NUnit’s built-in Collection asserts? They work with IEnumerable sequences.

http://nunit.com/index.php?p=collectionAssert&r=2.5.7

LikeLike

Reply
skeet says:

September 3, 2010 at 7:23 am

@Chris: It’s possible that they didn’t exist when I first looked at MoreLINQ… it’s also very likely I just didn’t know about them :)

Will look tonight and possibly rip out the TestExtensions bit…

LikeLike

Reply
Winston says:

September 3, 2010 at 7:41 am

Jon – any chance could you host the .zip somewhere that isn’t blocked by corporate web filters?

LikeLike

Reply
skeet says:

September 3, 2010 at 7:50 am

@Winston: You could try http://yoda.arachsys.com/blogfiles/csharp/l2o/Linq-To-Objects-1.zip which is the same thing, just without redirection.

I’m looking into source control instead though…

LikeLike

Reply
nyi nyi thann says:

September 3, 2010 at 8:45 am

Here is my humble attempt at reimplementing Linq to Object to use in my company’s .Net 2.0 project.
http://tanker.codeplex.com/

LikeLike

Reply
craig says:

September 3, 2010 at 9:35 am

Um this sounds like you’re reinventing the wheel.

LikeLike

Reply
Chris Martin says:

September 3, 2010 at 12:09 pm

Great idea, Jon!

Please consider hosting this project on Github or the like. I would love to be able to follow along without having to download, unzip, and merge.

No pressure. ;)

LikeLike

Reply
Pavel Minaev says:

September 4, 2010 at 3:18 am

Jon, how are you going to ensure that performance characteristics of your implementation (once you get to the really interesting stuff such as grouping and joins) are on par with the stock implementation?

LikeLike

Reply
skeet says:

September 4, 2010 at 3:40 am

@Pavel: Not sure, to be honest. Will have to think about that. I’ll certainly be talking about performance and reasoning about it… but testing could be tricky.

LikeLike

Reply
Nilesh Gule says:

September 4, 2010 at 1:07 pm

Great idea Jon. Just wanted to check if you are the same Jon Skeet who is rocking the stackoverflow with you popularity?

LikeLike

Reply
skeet says:

September 5, 2010 at 2:02 am

@Nilesh: Yup, that’s me.

LikeLike

Reply
keith says:

September 6, 2010 at 5:56 pm

uh … wow. Reimplementing link as a leisure activity. Wow! If it was anybody but you (Jon Skeet) .. I’d think it a little ‘nuts’ … yet it will be deeply interesting as a teaching and understanding exercise … I’m looking forward to it … seems like you have a little too much time on you hands .. there’s got to be another CSharp focused book you’d like to work on? What about a Windows Phone 7 book ;-) ???

LikeLike

Reply
keith says:

September 6, 2010 at 6:00 pm

Jon .. my previous ‘time on your hands’ comment was too harsh … I’d be very interested in ANY programming book you might choose to write and I know I’ll learn a great deal from these upcoming blog posts.

LikeLike

Reply
Richard says:

September 8, 2010 at 1:09 pm

@nyi nyi thann:
LINQBridge already provides L2O for .NET 2:
http://code.google.com/p/linqbridge/

LikeLike

Reply
Pingback: LINQ: Notation, Syntax, and Snags | somewhat abstract

If anybody’s wondering what this ThrowingEnumerable class is, a google search came up with the below from morelinq google code: enjoy :)

namespace Edulinq.TestSupport
{
///
/// Class to help for deferred execution tests: it throw an exception
/// if GetEnumerator is called.
///
public sealed class ThrowingEnumerable : IEnumerable
{
public IEnumerator GetEnumerator()
{
throw new InvalidOperationException();
}

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    /// <summary>
    /// Check that the given function uses deferred execution.
    /// A "spiked" source is given to the function: the function
    /// call itself shouldn't throw an exception. However, using
    /// the result (by calling GetEnumerator() then MoveNext() on it) *should*
    /// throw InvalidOperationException.
    /// </summary>
    public static void AssertDeferred<T>(
        Func<IEnumerable<int>, IEnumerable<T>> deferredFunction)
    {
        ThrowingEnumerable source = new ThrowingEnumerable();
        var result = deferredFunction(source);
        using (var iterator = result.GetEnumerator())
        {
            Assert.Throws<InvalidOperationException>(() => iterator.MoveNext());
        }
    }
}

LikeLike

jonskeet says:

November 26, 2016 at 10:31 am

Or just look in github: https://github.com/jskeet/edulinq

LikeLike

Reply

Pingback: A great resource about Linq (C#) – Nino Burini

Jon Skeet's coding blog

Reimplementing LINQ to Objects: Part 1 – Introduction

General approach

Testing

Code layout

First code drop

27 thoughts on “Reimplementing LINQ to Objects: Part 1 – Introduction”

Leave a comment Cancel reply

General approach

Testing

Code layout

First code drop

Share this:

Related

27 thoughts on “Reimplementing LINQ to Objects: Part 1 – Introduction”

Leave a comment Cancel reply