Dreaming of multiple tasks

October 31, 2010 jonskeet 25 Comments

I apologise in advance if this post ends up slightly rambly. Unlike the previous entries, this only partly deals with existing features. Instead, it considers how we might like to handle some situations, and where the language and type system thwarts us a little. Most of the code will be incomplete, as producing something realistic and complete is tricky.

Almost everything I’ve written about so far has dealt with the situation where we just want to execute one asynchronous operation at a time. That’s the case which the await keyword makes particularly easy to work with. That’s certainly a useful case, but it’s not the only one. I’m not going to write about that case in this post. (At least, not much.)

At the other extreme, we’ve got the situation where you have a large number of items to deal with, possibly dynamically generated – something like "fetch all the URLs in this list". We may well be able to launch some or even all of those operations in parallel, but we’re likely to use the results as a general collection. We’re doing the same thing with each of multiple inputs. This is data parallelism. I’m not going to write about that in this post either.

This post is about task parallelism. We want to execute multiple tasks in parallel (which may or may not mean using multiple threads – there could be multiple asynchronous web service calls, for example) and get all the results back before we proceed. Some of the tasks may be the same, but in general they’re not.

Describing the sample scenario

To help make everything sound vaguely realistic, I’m going to use a potentially genuine scenario, based on Stack Overflow. I’d like to make it clear that I have no idea how Stack Overflow really works, so please don’t make any assumptions. In particular, we’re only dealing with a very small portion of what’s required to render a single page. Nevertheless, it gives us something to focus on. (This is a situation I’ve found myself in several times at work, but obviously the internal services at Google are confidential, so I can’t start talking about a really real example.)

As part of rendering a Stack Overflow page for a logged in user, let’s suppose we need to:

Authenticate the user’s cookie (which gives us the user ID)
Find out the preferences for the user (so we know which tags to ignore etc)
Find out the user’s current reputation
Find out if there have been any recent comments or badges

All of these can be asynchronous operations. The first one needs to be executed before any of the others, but the final three can all be executed in parallel. We need all the results before we can make any more progress.

I’m going to assume an appropriate abstraction which contains the relevant asynchronous methods. Something like this:

public interface IStackService
{
    Task<int?> AuthenticateUserAsync(string cookie);
    Task<UserSettings> GetSettingsAsync(int userId);
    Task<int> GetReputationAsync(int userId);
    Task<RecentActivity> GetRecentActivityAsync(int userId);
}

The simple "single-threaded" implementation

I’ve put "single-threaded" in quotes here, because this may actually run across multiple-threads, but only one operation will be executed at a time. This is really just here for reference – because it’s so easy, and it would be nice to get the same simplicity with more parallelism.

public async Task<Page> RenderPage(Request request)
{
    int? maybeUserId = await service.AuthenticateUserAsync(request.Cookie);
    if (maybeUserId == null)
    {
        return RenderUnauthenticatedPage();
    }
    int userId = maybeUserId.Value;

    UserSettings settings = await service.GetUserSettingsAsync(userId);
    int reputation = await service.GetReputationAsync(userId);
    RecentActivity activity = await service.GetRecentActivityAsync(userId);

// Now combine the results and render the page
}

Just to be clear, this is still better than the obvious synchronous equivalent. While those asynchronous calls are executing, we won’t be sat in a blocked thread, taking very little CPU but hogging a decent chunk of stack space. The scheduler will have less work to do. Life will be all flowers and sunshine… except for the latency.

If each of those requests takes about 100ms, it will take 400ms for the whole lot to complete – when it could take just 200ms. We can’t do better than 200ms with the operations we’ve got to work with: we’ve got to get the user ID before we can perform any of the other operations – but we can do all the other three in parallel. Let’s try doing that using the tools we’ve got available to us and no neat tricks, to start with.

Declaring tasks and waiting for them

First, let’s talk about TaskEx.WhenAll(). This is a method provided in the CTP library, and I wouldn’t be surprised to see it move around a bit over time. There are a bunch of overloads for this – some taking multiple Task<TResult> items, and some being more weakly typed. It simply lets you wait for multiple tasks to complete – and because it returns a task itself, we can "await" it in the usual asyncrhonous way. In this case we have to use the weakly typed version, because our tasks are of different types. That’s fine though, because we’re not going to use the result anyway, except for waiting. (And in fact we’ll let the compiler deal with that for us.)

The code for this isn’t too bad, but it’s a bit more long-winded:

public async Task<Page> RenderPage(Request request)
{
    int? maybeUserId = await service.AuthenticateUserAsync(request.Cookie);
    if (maybeUserId == null)
    {
        return RenderUnauthenticatedPage();
    }
    int userId = maybeUserId.Value;

    var settingsTask = service.GetUserSettingsAsync(userId);
    var reputationTask = service.GetReputationAsync(userId);
    var activityTask = service.GetRecentActivityAsync(userId);

    // This overload of WhenAll just returns Task, so there’s no result
    // to wait for: we’ll get the various results from the tasks themselves
    await TaskEx.WhenAll(settingsTask, reputationTask, activityTask);

    // By now we know that the result of each task is available
    UserSettings settings = settingsTask.Result;
    int reputation = reputationTask.Result;
    RecentActivity activity = activityTask.Result;

// Now combine the results and render the page
}

This is still nicer than the pre-C# 5 code to achieve the same results, but I’d like to think we can do better. Really we just want to express the tasks once, wait for them all to complete, and get the results into variables, just like we did in the one-at-a-time code. I’ve thought of two approaches for this: one using anonymous types, and one using tuples. Both require changes to be viable – although the tuple approach is probably more realistic. Let’s look at it.

EDIT: Just before we do, I’d like to include code from one of the comments. If we’re going to use all the results directly, we can just await them in turn rather than using WhenAll – it’s like joining one thread after another. That leads to code like this:

public async Task<Page> RenderPage(Request request)
{
    int? maybeUserId = await service.AuthenticateUserAsync(request.Cookie);
    if (maybeUserId == null)
    {
        return RenderUnauthenticatedPage();
    }
    int userId = maybeUserId.Value;

    var settingsTask = service.GetUserSettingsAsync(userId);
    var reputationTask = service.GetReputationAsync(userId);
    var activityTask = service.GetRecentActivityAsync(userId);

    UserSettings settings = await settingsTask;
    int reputation = await reputationTask;
    RecentActivity activity = await activityTask;

// Now combine the results and render the page
}

I definitely like that more. Not sure why I didn’t think of it before…

Now back to the original post…

An ideal world of tuples

I’m assuming you’re aware of the family of System.Tuple types. They were introduced in .NET 4, and are immutable and strongly typed, both of which are nice features. The downsides are that even with type inference they’re still slightly awkward to create, and extracting the component values requires using properties such as Item1, Item2 etc. The C# compiler is completely unaware of tuples, which is slightly annoying. I would like two new features in C# 5:

Tuple literals: the ability to write something like var tuple = ("Foo", 10); to create a Tuple<string, int> – I’m not overly bothered with the exact syntax, so long as it’s concise.
Assignment to multiple variables from a single tuple. For example: var (ok, value) = int.TryParseToTuple("10");. Assuming a method with signature Tuple<bool, int> TryParseToTuple(string text) this would make ok a variable of type bool, and value a variable of type int.

Just to pre-empt others, I’m aware that F# helps on this front already. C# could do with catching up :)

Anyway, imagine we’ve got those language features. Then imagine a set of extension methods looking like this, but with another overload for 2-value tuples, another for 4-value tuples etc:

public static class TupleExtensions
{
    public static async Task<Tuple<T1, T2, T3>> WhenAll<T1, T2, T3>
        (this Tuple<Task<T1>, Task<T2>, Task<T3>> tasks)
    {
        await TaskEx.WhenAll(tasks.Item1, tasks.Item2, tasks.Item3);
        return Tuple.Create(tasks.Item1.Result, tasks.Item2.Result, tasks.Item3.Result);
    }
}

It can look a bit confusing because of all the type arguments and calls to Result and ItemX properties… but essentially it takes a tuple of tasks, and returns a task of a tuple returning the component values. How does this help us? Well, take a look at our Stack Overflow code now:

public async Task<Page> RenderPage(Request request)
{
    int? maybeUserId = await service.AuthenticateUserAsync(request.Cookie);
    if (maybeUserId == null)
    {
        return RenderUnauthenticatedPage();
    }
    int userId = maybeUserId.Value;

    var (settings, reputation, activity) = await (service.GetUserSettingsAsync(userId),
                                                  service.GetReputationAsync(userId),
                                                  service.GetRecentActivityAsync(userId))
                                                 .WhenAll();

// Now combine the results and render the page
}

If we knew we always wanted to wait for all the tasks, we could actually change our extension method to one called GetAwaiter which returned a TupleAwaiter or something like that – so we could get rid of the call to WhenAll completely. However, I’m not sure that would be a good thing. I like explicitly stating how we’re awaiting the completion of all of these tasks.

The real world of tuples

Back in the real world, we don’t have these language features on tuples. We can still use the extension method, but it’s not quite as nice:

public async Task<Page> RenderPage(Request request)
{
    int? maybeUserId = await service.AuthenticateUserAsync(request.Cookie);
    if (maybeUserId == null)
    {
        return RenderUnauthenticatedPage();
    }
    int userId = maybeUserId.Value;

    var results = await Tuple.Create(service.GetUserSettingsAsync(userId),
                                     service.GetReputationAsync(userId),
                                     service.GetRecentActivityAsync(userId))
                             .WhenAll();

    var settings = results.Item1;
    var reputation = results.Item2;
    var activity = results.Item3;

// Now combine the results and render the page
}

We’ve got an extra local variable we don’t need, and the ugliness of the ItemX properties is back. Oh well. Maybe tuples aren’t the best approach. Let’s look at a closely related cousin, anonymous types…

An ideal world of anonymous types

Extension methods on anonymous types are somewhat evil. They’re potentially powerful, but they definitely have drawbacks. Aside from anything else, you can’t add a generic constraint to require that a type is anonymous, and you certainly can’t add a generic constraint to say that each member of the anonymous type must be a task (which is what we want here). But the difficulties go further than that. I would like to be able to use something like this:

public async Task<Page> RenderPage(Request request)
{
    int? maybeUserId = await service.AuthenticateUserAsync(request.Cookie);
    if (maybeUserId == null)
    {
        return RenderUnauthenticatedPage();
    }
    int userId = maybeUserId.Value;

    var results = await new { Settings = service.GetUserSettingsAsync(userId),
                              Reputation = service.GetReputationAsync(userId),
                              Activity = service.GetRecentActivityAsync(userId) }
                        .WhenAll();

// Use results.Settings, results.Reputation and results.Activity to render
// the page
}

Now in this magical world, we’d have an extension method on T where T : class which would check that all the properties were of type Task<TResult> (with a different TResult for each property, potentially) and return a task of a new anonymous type which had the same properties… but without the task part. Essentially, we’re trying to perform the same inversion that we did with tuples, moving where the task "decorator" bit comes. We can’t do that with anonymous types – there’s simply no way of expressing it in the language. We could potentially generate a new type at execution time, but there’s no way of getting compile-time safety.

These two problems suggest two different solutions though. Firstly – and more simply – if we’re happy to lose compile-time safety, we can use the dynamic typing from C# 4.

The real world of anonymous types and dynamic

We can fairly easily write an async extension method to create a Task<dynamic>. The code would involve reflection to extract the tasks from the instance, call TaskEx.WhenAll to wait for them to complete, and then populate an ExpandoObject. I haven’t included the extension method here because frankly reflection code is pretty boring, and the async part of it is what we’ve seen everywhere else. Here’s what the consuming code might look like though:

public async Task<Page> RenderPage(Request request)
{
    int? maybeUserId = await service.AuthenticateUserAsync(request.Cookie);
    if (maybeUserId == null)
    {
        return RenderUnauthenticatedPage();
    }
    int userId = maybeUserId.Value;

    dynamic results = await new { Settings = service.GetUserSettingsAsync(userId),
                                  Reputation = service.GetReputationAsync(userId),
                                  Activity = service.GetRecentActivityAsync(userId) }
                            .WhenAllDynamic();

    UserSettings settings = results.Settings;
    int reputation = results.Reputation;
    RecentActivity activity = results.Activity;

// Use our statically typed variables in the rest of the code
}

The extra local variables are back, because I don’t like being dynamic for more type than I can help. Here, once we’ve copied the results into our local variables, we can ignore the dynamic results variable for the rest of the code.

This is pretty ugly, but it would work. I’m not sure that it’s significantly better than the "works but uses ItemX" tuple version though.

Now, what about the second thought, about the difficulty of translating (Task<X>, Task<Y>) into a Task<(X, Y)>?

Monads

I’m scared. Writing this post has made me start to think I might be starting to grok monads. This is almost certainly an incorrect belief, but I’m sure I’m at least making progress. If we think of "wrapping a task around a type" as a sort of type decoration, it starts sounding similar to the description of monads that I’ve read before. The fact that async workflows in F# are one example of its monad support encourages me too. I have a sneaking suspicion that the async/await support in C# 5 is partial support for this specific monad – in particular, you express the result of an async method via a non-task-related return statement, but the declared return type is the corresponding "wrapped" type.

Now, C#’s major monadic support comes in the form of LINQ, and particularly SelectMany. Therefore – and I’m writing as I think here – I would like to end up being able to write something like this:

    var results = await from settings in service.GetUserSettingsAsync(userId)
                        from reputation in service.GetReputationAsync(userId)
                        from activity in service.GetActivityAsync(userId)
                        select new { settings, reputation, activity };

// Use results.settings, results.reputation, results.activity for the
// rest of the code
}

That feels like it should work, but as I write this I genuinely don’t know whether or not it will.

What I do know is that we only actually to write a single method to get that to work: SelectMany. We don’t even need to implement a Select method, as if there’s only a select clause following an extra from clause, the compiler just uses SelectMany and puts a projection at the end. We want to be able to take an existing task and a way of creating a new task from it, and somehow combine them.

Just to make it crystal clear, the way we’re going to use LINQ is not for sequences at all. It’s for tasks. So we don’t want to see IEnumerable<T> anywhere in our final signatures. Let’s see what we can do.

(10 minutes later.) Okay, wow. I’d expected it to be at least somewhat difficult to get it to compile. I’m not quite there yet in terms of parallelization, but I’ve worked out a way round that. Just getting it to work at all is straightforward. I started off by looking at the LINQ to Objects signature used by the compiler:

public static IEnumerable<TResult> SelectMany<TSource, TCollection, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, IEnumerable<TCollection>> collectionSelector,
    Func<TSource, TCollection, TResult> resultSelector
)

Now we want our tasks to end up being independent, but let’s start off simply, just changing IEnumerable to Task everywhere, and changing the type parameter names:

public static Task<TResult> SelectMany<T1, T2, TResult>(
    this Task<T1> source,
    Func<T1, Task<T2>> taskSelector,
    Func<T1, T2, TResult> resultSelector
)

There’s still that nagging doubt about the dependency of the second task on the first, but let’s at least try to implement it.

We know we want to return a Task<TResult>, and we know that given a T1 and a T2 we can get a TResult. We also know that by writing an async method, we can ask the compiler to go from a return statement involving a TResult to a method with a declared return type of Task<TResult>. Once we’ve got that hint, the rest is really straightforward:

public static async Task<TResult> SelectMany<T1, T2, TResult>
    (this Task<T1> source,
     Func<T1, Task<T2>> taskSelector,
     Func<T1, T2, TResult> resultSelector)
{
    T1 t1 = await source;
    T2 t2 = await taskSelector(t1);
    return resultSelector(t1, t2);
}

There it is. We asynchronously await the result of the first task, feed the result into taskSelector to get the second task, await that task to get a second value, and then combine the two values with the simple projection to give the result we want to return asynchronously.

In monadic terms as copied from Wikipedia, I believe that:

The type constructor for the async monad is simply that T goes to Task<T> for any T.
The unit function is essentially what the compiler does for us when we declare a method as async – it provides the "wrapping" to get from a return statement using T to a method with a return type of Task<T>.
The binding operation is what we’ve got above – which should be no surprise, as SelectMany is the binding function in "normal" LINQ.

I’m breathless with the simultaneous simplicity, beauty and complexity of it all. It’s simple because once I’d worked out the method signature (which is essentially what the definition of the binding function requires) the method wrote itself. It’s beautiful because once I’d picked the right method to use, the compiler did everything else for me – despite it sounding really significantly different to LINQ. It’s complex because I’m still feeling my way through all of this.

It’s a shame that after all of this, we still haven’t actually got what we wanted. To do that, we have to fake it.

Improper monads

("Improper monads" isn’t a real term. It scores 0 hits on Google at the moment – by the time you read this, that count will probably be higher, but only because of this post.)

We wanted to execute the tasks in parallel. We’re not actually doing so. We’re executing one task, then another. Oops. The problem is that our monadic definition says that we’re going to rely on the result of one task to generate the other one. We don’t want to do that. We want to get both tasks, and execute them at the same time.

Unfortunately, I don’t think there’s anything in LINQ which represents that sort of operation. The closest I can think of is a join – but we’re not joining on anything. I’m pretty sure we could do this by implementing InnerJoin and just ignoring the key selectors, but if we’re going to cheat anyway, we might as well cheat with the signature we’ve got. In this cheating version of LINQ, we assume that the task selector (which produces the second task) doesn’t actually rely on the argument it’s given. So let’s just give it anything – the default value, for example. Then we’ve got two tasks which we can await together using WhenAll as before.

public static async Task<TResult> SelectMany<T1, T2, TResult>
    (this Task<T1> task1,
     Func<T1, Task<T2>> taskSelector,
     Func<T1, T2, TResult> resultSelector)
{
    Task<T2> task2 = taskSelector(default(T1));
    await TaskEx.WhenAll(task1, task2);
    return resultSelector(task1.Result, task2.Result);
}

Okay, that was easy. But it looks like it’s only going to wait for two tasks at a time. We’ve got three in our example. What’s going to happen? Well, we’ll start waiting for the first two tasks when SelectMany is first called… but then we’ll return back to the caller with the result as a task. We’ll then call SelectMany again with the third task. We’ll then wait for [tasks 1 and 2] and [task 3]… which means waiting for all of them. Bingo! Admittedly I’ve a sneaking suspicion that if any task fails it might mean more deeply nested exceptions than we’d want, but I haven’t investigated that yet.

I believe that this implementation lets us basically do what we want… but like everything else, it’s ugly in its own way. In this case it’s ugly because it allows us to express something (a dependency from one task to another) that we then don’t honour. I don’t like that. We could express the fact that getting a user’s reputation depends on authenticating the user first – but we’d end up finding the reputation of user 0, because that’s the result we’d pass in. That sucks.

EDIT: Along the same lines of the previous edit, we can make this code neater and avoid using WhenAll:

Back to the original post…

Ironically, someone on Twitter mentioned a new term to me today, which seems strikingly relevant: joinads. They pointed to a research paper written by Tomas Petricek and Don Syme – which on first glance is quite possibly exactly what I’ve been mostly-independently coming up with here. The reason LINQ query expressions don’t quite fit what we want is that they’re based on monads – if they’d been based on joinads, maybe it would all have worked well. I’ll read the paper and see if that gives me the answer. Then I’ll watch Bart de Smet’s PDC 2010 presentation which I gather is rather good.

Conclusion

I find myself almost disappointed. Those of you who already understand monads are quite possibly shaking your heads, saying to yourself that it was about time I started to "get" them (and that I’ve got a long way to go). Those of you who didn’t understand them before almost certainly don’t understand them any better now, given the way this post has been written.

So I’m not sure whether I’ll have any readers left by now… and I’ve failed to come up with a good solution to the original problem. In my view the nicest approach by far is the one using tuples, and that requires more language support. (I’m going to nag Mads about that very shortly.) And yet I’m simultaneously on a huge high. I’m very aware of my own limitations when it comes to computer science theory, but today it feels like I’ve at least grasped the edge of something beautiful.

And now, I must stop blogging before my family life falls apart or my head explodes. Goodnight all.

25 thoughts on “Dreaming of multiple tasks”

virtualblackfox says:

October 31, 2010 at 4:40 pm

Just a small typo

var tuple = (“Foo”, 10); to create a Tuple

instead of (i guess as the C# team don’t seem to want to introduce c++ style templates) :

var tuple = (“Foo”, 10); to create a Tuple

LikeLike

Reply
skeet says:

October 31, 2010 at 4:54 pm

@virtualblackfox: Thanks, fixed.

LikeLike

Reply
configurator says:

October 31, 2010 at 5:01 pm

public static async Task SelectMany
(this Task task1,
Func<Dummy, Task> taskSelector,
Func resultSelector)
{
Task task2 = taskSelector(default(T1));
await TaskEx.WhenAll(task1, task2);
return resultSelector(task1.Result, task2.Result);
}

public struct Dummy { }

Now you can no longer express using the result of the first task. Problem solved?

LikeLike

Reply
configurator says:

October 31, 2010 at 5:04 pm

While tuples (with language support) would be a much better solution, failing that I have to say that your SelectMany approach is simply brilliant. This is the first time I’ve ever seen linq and thought “Wow. This looks *right*”

LikeLike

Reply
Strilanc says:

October 31, 2010 at 5:05 pm

There’s no need to use WhenAll if you’re going to be directly using all of the results.

‘Start all tasks
Dim settingsTask = service.GetUserSettingsAsync(userId)
Dim reputationTask = service.GetReputationAsync(userId)
Dim activityTask = service.GetRecentActivityAsync(userId)
‘Wait for all tasks
Dim settings = Await settingsTask
Dim reputation = Await reputationTask
Dim activity = Await activityTask

(You can cut a line but it breaks the nice symmetry)

LikeLike

Reply
Oren Novotny says:

October 31, 2010 at 5:19 pm

Is there something that you could do with the AsyncEnumerable that’s in the latest Rx release? It looks like they’ve implemented all of the standard Linq to Objects as tasks.

LikeLike

Reply
skeet says:

October 31, 2010 at 5:24 pm

@configurator: Interesting use of the Dummy type. Still feels like cheating, but interesting certainly. (Also wouldn’t catch 100% of “inappropriate” uses, but most.)

LikeLike

Reply
skeet says:

October 31, 2010 at 5:25 pm

@Strilanc: Good point. Not sure why I didn’t think of that!

@Oren: I haven’t looked at AsyncEnumerable, but I suspect it’s sequence based and therefore not directly relevant.

LikeLike

Reply
skeet says:

October 31, 2010 at 5:30 pm

@Strilanc: I’ve now edited your suggestions into the main body of the post. Hope that’s okay. I didn’t want them to get lost.

LikeLike

Reply
mihailik says:

October 31, 2010 at 5:32 pm

Can get rid of dummy too:

public static Task SelectMany
(this Task source,
Func<Task, Task> taskSelector,
Func resultSelector)

LikeLike

Reply
Richard Tallent says:

October 31, 2010 at 6:08 pm

My solution is simply an await block. In such a block, all assignments would start simultaneously and execute until all complete, in no particular order. Thus, none should refer to any variables set inside the block.

Example:

    UserSettings settings;
    int reputation;
    RecentActivity activity;

await {
    settings = service.GetUserSettingsAsync(userId);
    reputation = service.GetReputationAsync(userId);
    activity = service.GetRecentActivityAsync(userId);
}

I’m no async guru, but this seems a lot cleaner as a language feature than stretching anonymous methods and monads. This also fits the same basic block structure of the existing await keyword, and extends it the same way as other block keywords like if, foreach, while, etc. (though in this case the operations within the block are not synchronous to one another, only from the viewpoint of the external chide block).

For the example given of loading urls, you could add a similar keyword “awaiteach” that would act like foreach, but each returned value from the ienumerable would effectively start at the same time and complete in some unpredictable order.

LikeLike

Reply
duniho says:

October 31, 2010 at 7:20 pm

Couple more typos: “There it is. We asynchronously await the result of the first task, feed the result into collectionSelector to get the second task…”

I believe you want “taskSelector” instead of “collectionSelector”, based on the code example to which the text refers.

Also, the second bullet point in the list just after that paragraph appears to be truncated. Did you intend to finish that thought?

The LINQ version is an interesting exercise, and maybe if there were baked-in language support for it, it’d be more practical. But it seems to me that the earlier “get the Task’s started, await each one” is actually very readable and concise.

Is there a different example in which it would be significantly more awkward to deal with the concurrently running tasks, where your LINQ approach would actually be significantly simpler?

LikeLike

Reply
configurator says:

October 31, 2010 at 8:17 pm

Silly me. We don’t need the dummy at all in there!

public static async Task SelectMany
(this Task task1,
Func<Task> taskSelector,
Func resultSelector)
{
Task task2 = taskSelector();
await TaskEx.WhenAll(task1, task2);
return resultSelector(task1.Result, task2.Result);
}

And yes, the from/select syntax still works.

LikeLike

Reply
configurator says:

October 31, 2010 at 8:19 pm

Sorry, I take my previous comment back – I was testing against the old version of the extension method because I put the new one in the wrong class… We do need a parameter for the Func.

LikeLike

Reply
skeet says:

November 1, 2010 at 2:06 am

@Richard: I’m uncomfortable with the block format because it seems awkward in terms of declaring and then assigning later. It also means you can’t use var – which means no anonymous types. That’s the beauty of the tuple syntax.

@Peter: Thanks, fixing the typos. (I originally left the name collectionSelector from LINQ to Objects, which is why that mistake is there.)

As for whether the LINQ approach would ever be simpler – I think it might depend on what else you could do with it :) I suspect it’s not a great fit, to be honest – it was more for intellectual curiosity than anything else. While I still think the tuple approach would be the nicest, the “start them all, now finish them all” is probably best at the moment.

LikeLike

Reply
James Miles says:

November 1, 2010 at 5:06 am

This combinator (your modified version of select many) already exists in the Rx framework. It’s called CombineLatest. There are others, such as Zip and Switch.

In my opinion query comprehension syntax should have been more extensible.

LikeLike

Reply
skeet says:

November 1, 2010 at 5:10 am

@James: But do those work for Tast, or only Observable?

LikeLike

Reply
Simon Buchan says:

November 1, 2010 at 7:06 am

@skeet: Would this be ok?

var results = await Task.WaitAll(
GetUserSettingsAsync(userId),
GetReputationAsync(userId),
GetActivityAsync(userId),
(s, r, a) =>
new { Settings = s, Reputation = r, Activity = a });

public class Task {
…
public static R WaitAll(
Task t1,
Task t2,
Task t3,
Func combine);

I’m not sure if it would be terribly useful. Perhaps slightly more efficient than awaiting in order, if it could avoid queuing and synchronizing essentially a mov op.

@Richard Tallent: but await is an expression, not a statement:

ProcessOrder(await NextOrderAsync());

Would the await block rewrite every method call with a sutable .GetAwaiter() available on the return type? Or do you restrict the syntax to only allow assignment statements from an awaitable to a variable?

LikeLike

Reply
skeet says:

November 1, 2010 at 7:12 am

@Simon: That’s an interesting idea, certainly. I’d still prefer the tuple support if possible :)

As for the efficiency of awaiting in order – that’s going to be the subject of another blog post. In short, I don’t think either WhenAll or WhenAny quite covers the most common use case.

LikeLike

Reply
Omer Mor says:

November 1, 2010 at 12:22 pm

The Rx team has implemented LINQ operators on Task in their latest release. However they marked them internal.
Using Reflector we can see that they chose to express your “parallel execution operator” using the Zip operator which fits nicely.
Their SelectMany implementation is similar to your first attempt: a serial execution of the tasks.

If you want to check their implementation, open the new System.Linq.Async assembly, and decompile the System.Threading.Tasks.TaskExt class.

LikeLike

Reply
Joren says:

November 2, 2010 at 3:50 pm

I actually liked it better before you started messing around with tuples. That syntax is just awkward, and I think it’s too expression oriented for C#.

In fact I like the WhenAll version best, since it’s just so obvious.

LikeLike

Reply
David Nelson says:

December 2, 2010 at 11:29 am

I am still working through this post; it’s a long one and it’s going to take me a while. But right up front I am confused by the “Edit” that you added from Strilanc’s comment. It looks to me like it is exactly the same as the first code block in the post; the only difference is in the eidt you store the Tasks in temporary variables, and in the first one you don’t. But that shouldn’t affect the execution. Am I missing something?

LikeLike

Reply
skeet says:

December 2, 2010 at 1:48 pm

@David: In the first version, we wait for the first task to finish… then we start the second one. We wait for that to finish… then we start the third one. There’s no opportunity for parallelism.

In the version after the edit, we start *all* the tasks, and then wait for them one at a time. We don’t need to wait for the first to finish before starting the second and third.

LikeLike

Reply
David Nelson says:

December 2, 2010 at 2:20 pm

Ok, that makes sense. Mentally I don’t “start” the task until it is “await”ed; similar to how deffered sequences are not enumerated until they are “foreach”ed. I think you referred to it in the comments of another post as a “cold” task, and indicated that it is the way F# does it. I haven’t even used async in F#; it just seems to be the way I intuitively assume it will work.

LikeLike

Reply
Dax says:

December 19, 2010 at 2:44 pm

I’ve yet to work this all the way through, but the dependency of the second task on the first may actually make sense, as in it could allow you to wait for the first task to complete, but you don’t have to? Sort of like the Enumerable.SelectMany can use the first IEnumerable to grab more enumerables from it for the second round, or it can ignore the first IEnumerable and the result will just be the cross product.

i.e. containership
from container in containers
from object in container.Objects

versus cross-products
from thing in things
from otherThing in otherThings

If there was a similar way to allow the user to specify what to wait for in the async SelectMany, then it could make the syntax a lot more powerful, though I’m not sure how that would work. Maybe

from task1Result in task1
from task2Result in task2.await(task1Result)

or maybe that’s backwards. Though perhaps LINQ just needs to be more extendable?

LikeLike

Reply

Jon Skeet's coding blog

Dreaming of multiple tasks

Describing the sample scenario

The simple "single-threaded" implementation

Declaring tasks and waiting for them

An ideal world of tuples

The real world of tuples

The real world of anonymous types and dynamic

Monads

Improper monads

Conclusion

25 thoughts on “Dreaming of multiple tasks”

Leave a comment Cancel reply

Describing the sample scenario

The simple "single-threaded" implementation

Declaring tasks and waiting for them

An ideal world of tuples

The real world of tuples

The real world of anonymous types and dynamic

Monads

Improper monads

Conclusion

Share this:

Related

25 thoughts on “Dreaming of multiple tasks”

Leave a comment Cancel reply