I’ve been putting off writing about this for a while now, mostly because it’s such a huge topic. I’m not going to try to give more than a brief introduction to it here – don’t expect to be able to whip up your own LINQ to SQL implementation afterwards – but it’s worth at least having an idea of what happens when you use something like LINQ to SQL, NHibernate or the Entity Framework.
Just as LINQ to Objects is primarily interested in IEnumerable<T> and the static Enumerable class, so out-of-process LINQ is primarily interested in IQueryable<T> and the static Queryable class… but before we get to them, we need to talk about expression trees.
Expression Trees
To put it in a nutshell, expression trees encapsulate logic in data instead of code. While you can introspect .NET code via MethodBase.GetMethodBody and then MethodBody.GetILAsByteArray, that’s not really a practical approach. The types in the System.Linq.Expressions define expressions in an easier-to-process manner. When expression trees were introduced in .NET 3.5, they were strictly for expressions, but the Dynamic Language Runtime uses expression trees to represent operations, and the range of logic represented had to expand accordingly, to include things like blocks.
While you certainly can build expression trees yourself (usually via the factory methods on the nongeneric Expression class), and it’s fun to do so at times, the most common way of creating them is to use the C# compiler’s support for them via lambda expressions. So far we’ve always seen a lambda expression being converted to a delegate, but it can also convert lambdas to instances of Expression<TDelegate>, where TDelegate is a delegate type which is compatible with the lambda expression. A concrete example will help here. The statement:
Expression<Func<int, int>> addOne = x => x + 1;
will be compiled into code which is effectively something like this:
var parameter = Expression.Parameter(typeof(int), "x");
var one = Expression.Constant(1, typeof(int));
var addition = Expression.Add(parameter, one);
var addOne = Expression.Lambda<Func<int, int>>(addition, new ParameterExpression[] { parameter });
The compiler has some tricks up its sleeves which allow it to refer to methods, events and the like in a simpler way than we can from code, but largely you can regard the transformation as just a way of making life a lot simpler than if you had to build the expression trees yourself every time.
IQueryable, IQueryable<T> and IQueryProvider
Now that we’ve got the idea of being able to inspect logic relatively easily at execution time, let’s see how it applies to LINQ.
There are three interfaces to introduce, and it’s probably easiest to start with how they appear in a class diagram:

Most of the time, queries are represented using the generic IQueryable<T> interface, but this doesn’t actually add much over the nongeneric IQueryable interface it extends, other than also extending IEnumerable<T> – so you can iterate over the contents of an IQueryable<T> just as with any other sequence.
IQueryable contains the interesting bits, in the form of three properties: ElementType which indicates the type of the elements within the query (in other words, a dynamic form of the T from IQueryable<T>), Expression returns the expression tree for the query so far, and Provider returns the query provider which is responsible for creating new queries and executing the existing one. We won’t need to use the ElementType property ourselves, but we’ll need both the Provider and Expression properties.
The static Queryable class
We’re not going to implement any of the interfaces ourselves, but I’ve got a small sample program to demonstrate how they all work, imagining we were implementing most of Queryable ourselves. This static class contains extension methods for IQueryable<T> just as Enumerable does for IEnumerable<T>. Most of the query operators from LINQ to Objects appear in Queryable as well, but there are a few notable omissions, such as the To{Lookup, Array, List, Dictionary} methods. If you call one of those on an IQueryable<T>, the Enumerable implementations will be used instead. (IQueryable<T> extends IEnumerable<T>, so the extension methods in Enumerable are applicable to IQueryable<T> sequences as well.)
The big difference between the Queryable and Enumerable methods in terms of their declarations is in the parameters:
- The "source" parameter in Queryable is always of type IQueryable<TSource> instead of IEnumerable<TSource>. (Other sequence parameters such as the sequence to concatenate for Queryable.Concat are expressed as IEnumerable<T>, interestingly enough. This allows you to express a SQL query using "local" data as well; the query methods work out whether the sequence is actually an IQueryable<T> and act accordingly.)
- Any parameters which were delegates in Enumerable are expression trees in Queryable; so while the selector parameter in Enumerable.Select is of type Func<TSource, TResult>, the equivalent in Queryable.Select is of type Expression<Func<TSource, TResult>>
The big difference between the methods in terms of what they do is that whereas the Enumerable methods actually do the work (eventually – possibly after deferred execution of course), the Queryable methods themselves really don’t do any work: they just ask the query provider to build up a query indicating that they’ve been called.
Let’s have a look at Where for example. If we wanted to implement Queryable.Where, we would have to:
- Perform argument checking
- Get the "current" query’s Expression
- Build a new expression representing a call to Queryable.Where using the current expression as the source, and the predicate expression as the predicate
- Ask the current query’s provider to build a new IQueryable<T> based on that call expression, and return it.
It all sounds a bit recursive, I realize – the Where call needs to record that a Where call has happened… but that’s all. You may very well wonder where all the work is happening. We’ll come to that.
Now building a call expression is slightly tedious because you need to have the right MethodInfo – and as Where is overloaded, that means distinguishing between the two Where methods, which is easier said than done. I’ve actually used a LINQ query to find the right overload – the one where the predicate parameter Expression<Func<T, bool>> rather than Expression<Func<T, int, bool>>. In the .NET implementation, methods can use MethodBase.GetCurrentMethod() instead… although equally they could have created a bunch of static variables computed at class initialization time. We can’t use GetCurrentMethod() for experimentation purposes, because the query provider is likely to expect the exact correct method from System.Linq.Queryable in the System.Core assembly.
Here’s our sample implementation, broken up quite a lot to make it easier to understand:
public static IQueryable<TSource> Where<TSource>(
this IQueryable<TSource> source,
Expression<Func<TSource, bool>> predicate)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (predicate == null)
{
throw new ArgumentNullException("predicate");
}
Expression sourceExpression = source.Expression;
Expression quotedPredicate = Expression.Quote(predicate);
MethodInfo method = typeof(Queryable).GetMethods()
.Where(m => m.Name == "Where")
.Where(m => m.GetParameters()[1]
.ParameterType
.GetGenericArguments()[0]
.GetGenericArguments().Length == 2)
.First();
MethodInfo closedMethod = method.MakeGenericMethod(new Type[] { typeof(TSource) });
Expression methodCall = Expression.Call(closedMethod, sourceExpression, quotedPredicate);
return source.Provider.CreateQuery<TSource>(methodCall);
}
There’s only one part of this code that I don’t really understand the need for, and that’s the call to Expression.Quote on the predicate expression tree. I’m sure there’s a good reason for it, but this particular example would work without it, as far as I can see. The real implementation uses it though, so dare say it’s required in some way.
EDIT: Daniel’s comment has made this somewhat clearer to me. Each of the arguments to Expression.Call after the MethodInfo itself is meant to be an expression which represents the argument to the method call. In our example we need an expression which represents an argument of type Expression<Func<TSource, bool>>. We already have the value, but we need to provide the layer of wrapping… just as we did with Expression.Constant in the very first expression tree I showed at the top. To wrap the expression value we’ve got, we use Expression.Quote. It’s still not clear to me exactly why we can use Expression.Quote but not Expression.Constant, but at least it’s clearer why we need something…
EDIT: I’m gradually getting there. This Stack Overflow answer from Eric Lippert has much to say on the topic. I’m still trying to get my head round it, but I’m sure when I’ve read Eric’s answer several times, I’ll get there.
We can even test that this works, by using the Queryable.AsQueryable method from the real .NET implementation. This creates an IQuerable<T> from any IEnumerable<T> using a built-in query provider. Here’s the test program, where FakeQueryable is a static class containing the extension method above:
using System;
using System.Collections.Generic;
using System.Linq;
class Test
{
static void Main()
{
List<int> list = new List<int> { 3, 5, 1 };
IQueryable<int> source = list.AsQueryable();
IQueryable<int> query = FakeQueryable.Where(source, x => x > 2);
foreach (int value in query)
{
Console.WriteLine(value);
}
}
}
This works, printing just 3 and 5, filtering out the 1. Yay! (I’m explicitly calling FakeQueryable.Where rather than letting extension method resolution find it, just to make things clearer.)
Um, but what’s doing the actual work? We’ve implemented the Where clause without providing any filtering ourselves. It’s really the query provider which has built an appropriate IQueryable<T> implementation. When we call GetEnumerator() implicitly in the foreach loop, the query can examine everything that’s built up in the expression tree (which could contain multiple operators – it’s nesting queries within queries, essentially) and work out what to do. In the case of our IQueryable<T> built from a list, it just does the filtering in-process… but if we were using LINQ to SQL, that’s when the SQL would be generated. The provider recognizes the specific methods from Queryable, and applies filters, projections etc. That’s why it was important that our demo Where method pretended that the real Queryable.Where had been called – otherwise the query provider wouldn’t know what the call expression
Just to hammer the point home even further… Queryable itself neither knows nor cares what kind of data source you’re using. Its job is not to perform any query operations itself; its job is to record the requested query operations in a source-agnostic manner, and let the source provider handle them when it needs to.
Immediate execution with IQueryProvider.Execute
All the operators using deferred execution in Queryable are implemented in much the same way as our demo Where method. However, that doesn’t cover the situation where we need to execute the query now, because it has to return a value directly instead of another query.
This time I’m going to use ElementAt as the sample, simply because it’s only got one overload, which makes it very easy to grab the relevant MethodInfo. The general procedure is exactly the same as building a new query, except that this time we call the provider’s Execute method instead of CreateQuery.
public static TSource ElementAt<TSource>(this IQueryable<TSource> source, int index)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
Expression sourceExpression = source.Expression;
Expression indexExpression = Expression.Constant(index);
MethodInfo method = typeof(Queryable).GetMethod("ElementAt");
MethodInfo closedMethod = method.MakeGenericMethod(new Type[] { typeof(TSource) });
Expression methodCall = Expression.Call(closedMethod, sourceExpression, indexExpression);
return source.Provider.Execute<TSource>(methodCall);
}
The type argument we provide to Execute is the desired return type – so for Count, we’d call Execute<int> for example. Again, it’s up to the query provider to work out what the call actually means.
It’s worth mentioning that both CreateQuery and Execute have generic and non-generic overloads. I haven’t personally encountered a use for the non-generic ones, but I gather they’re useful for various situations in generated code, particularly if you really don’t know the element type – or at least only know it dynamically, and don’t want to have to use reflection to generate an appropriate generic method call.
Transparent support in source code
One of the aspects of LINQ which raises it to the "genius" status (and "slightly scary" at the same time) is that most of the time, most developers don’t need to make any changes to their source code in order to use Enumerable or Queryable. Take this query expression and its translation:
var query =
from person
in family
where person.LastName ==
"Skeet" select person.FirstName;
var query = family.Where(person => person.LastName == "Skeet")
.Select(person => person.FirstName);
Which set of query methods will that use? It entirely depends on the compile-time type of the "family" variable. If that’s a type which implements IQueryable<T>, it will use the extension methods in Queryable, the lambda expression will be converted into expression trees, and the type of "query" will be IQueryable<string>. Otherwise (and assuming the type implements IEnumerable<T> isn’t some other interesting type such as ParallelEnumerable) it will use the extension methods in Enumerable, the lambda expressions will be converted into delgeates, and the type of "query" will be IEnumerable<string>.
The query expression translation part of the specification has no need to care about this, because it’s simply translating into a form which uses lambda expressions – the rest of overload resolution and lambda expression conversion deals with the details.
Genius… although it does mean you need to be careful that really you know where your query evaluation is going to take place – you don’t want to accidentally end up performing your whole query in-process having shipped the entire contents of a database across a network connection…
Conclusion
This was really a whistlestop tour of the "other" side of LINQ – and without going into any of the details of the real providers such as LINQ to SQL. However, I hope it’s given you enough of a flavour for what’s going on to appreciate the general design. Highlights:
- Expression trees are used to capture logic in a data structure which can be examined relatively easily at execution time
- Lambda expressions can be converted into expression trees as well as delegates
- IQueryable<T> and IQueryable form a sort of parallel interface hierarchy to IEnumerable<T> and IEnumerable – although the queryable forms extend the enumerable forms
- IQueryProvider enables one query to be built based on another, or executed immediately where appropriate
- Queryable provides equivalent extension methods to most of the Enumerable LINQ operators, except that it uses IQueryable<T> sources and expression trees instead of delegates
- Queryable doesn’t handle the queries itself at all; it simply records what’s been called and delegates the real processing to the query provider
I think I’ve now covered most of the topics I wanted to mention after finishing the actual Edulinq implementation. Next up I’ll talk about some of the thorny design issues (most of which I’ve already mentioned, but which bear repeating) and then I’ll write a brief "series conclusion" post with a list of links to all the other parts.