The importance of context, and a question of explicitness

(Just to be clear: I hate the word "explicitness". It reminds me of Rowan Atkinson as Marcus Browning MP, saying we should be purposelessnessless. But I can’t think of anything better here.)

For the last few days, I’ve been thinking about context – in the context of C# 5’s proposed asynchronous functionality. Now many of the examples which have been presented have been around user interfaces, with the natural restriction of doing all UI operations on the UI thread. While that’s all very well, I’m more interested in server-side programming, which has a slightly different set of restrictions and emphases.

Back in my post about configuring waiting, I talked about the context of where execution would take place – both for tasks which require their own thread and for the continuations. However, thinking about it further, I suspect we could do with richer context.

What might be included in a context?

We’re already used to the idea of using context, but we’re not always aware of it. When trying to service a request on a server, some or any of the following may be part of our context:

  • Authentication information: who are we acting as? (This may not be the end user, of course. It may be another service who we trust in some way.)
  • Cultural information: how should text destined for an end user by rendered? What other regional information is relevant?
  • Threading information: as mentioned before, what threads should be used both for "extra" tasks and continuations? Are we dealing with thread affinity?
  • Deadlines and cancellation: the overall operation we’re trying to service may have a deadline, and operations we create may have their own deadlines too. Cancellation tokens in TPL can perform this role for us pretty easily.
  • Logging information: if the logs need to tie everything together, there may be some ID generated which should be propagated.
  • Other request information: very much dependent on what you’re doing, of course…

We’re used to some of this being available via properties such as CultureInfo.CurrentCulture and HttpContext.Current – but those are tied to a particular thread. Will they be propagated to threads used for new tasks or continuations? Historically I’ve found that documentation has been very poor around this area. It can be very difficult to work out what’s going to happen, even if you’re aware that there’s a potential problem in the first place.

Explicit or implicit?

It’s worth considering what the above items have in common. Why did I include those particular pieces of information but not others? How can we avoid treating them as ambient context in the first place?

Well, fairly obviously we can pass all the information we need along via method arguments. C# 5’s async feature actually makes this easier than it was before (and much easier that it would have been without anonymous functions) because the control flow is simpler. There should be fewer method calls, each of which would each require decoration with all the contextual information required.

However, in my experience that becomes quite problematic in terms of separation of concerns. If you imagine the request as a tree of asynchronous operations working down from a top node (whatever code initially handles the request), each node has to provide all the information required for all the nodes within its subtree. If some piece of information is only required 5 levels down, it still needs to be handed on at each level above that.

The alternative is to use an implicit context – typically via static methods or properties which have to do the right thing, typically based on something thread-local. The context code itself (in conjunction with whatever is distributing the work between threads) is responsible for keeping track of everything.

It’s easy to point out pros and cons to both approaches:

  • Passing everything through methods makes the dependencies very obvious
  • Changes to "lower" tasks (even for seemingly innocuous reasons such as logging) end up causing chains of changes higher up the task tree – possibly to developers working on completely different projects, depending on how your components work
  • It feels like there’s a lot of work for very little benefit in passing everything explicitly through many layers of tasks
  • Implicit context can be harder to unit test elegantly – as is true of so many things using static calls
  • Implicit context requires everyone to use the same context. It’s no good high level code indicating which thread pool to use in one setting when some lower level code is going to use a different context

Ultimately it feels like a battle between purity and pragmatism: being explicit helps to keep your code purer, but it can mean a lot of fluff around your real logic, just to maintain the required information to pass onward. Different developers will have different approaches to this, but I suspect we want to at least keep the door open to both designs.

The place of Task/Task<T>

Even if Task/Task<T> can pass on the context for scheduling, what do we do about other information (authentication etc)? We have types like ThreadLocal<T> – in a world where threads are more likely to be reused, and aren’t really our unit of asynchrony, do we effectively need a TaskLocal<T>? Can context within a task be pushed and automatically popped, to allow one subtree to "override" the context for its nodes, while another subtree works with the original context?

I’ve been trying to think about whether this can be provided in "userland" code instead of in the TPL itself, but I’m not sure it can, easily… at least not without reinventing a lot of the existing code, which is never a good idea when it’s tricky parallelization code.

Should this be general support, or would it be okay to stick to just TaskScheduler.Current, leaving developers to pass other context explicitly?

Conclusion

These are thoughts which I’m hoping will be part of a bigger discussion. I think it’s something the community should think about and give feedback to Microsoft on well before C# 5 (and whatever framework it comes with) ships. I have lots of contradictory feelings about the right way to go, and I’m fully expecting comments to have mixed opinions too.

I’m sure I’ll be returning to this topic as time goes on.

Addendum (March 27th 2012)

Lucian Wischik recently mailed me about this post, to mention that F#’s support for async has had the ability to retain explicit context from the start. It’s also more flexible than the C# async support – effectively, it allows you to swap out AsyncTaskMethodBuilder etc for your own types, so you don’t always have to go via Task/Task<T>. I’ll take Lucian’s word for that, not knowing much about F# myself. One day…

14 thoughts on “The importance of context, and a question of explicitness”

  1. Yet another possibility would be to pass around a single catch-all object, rather like ASP’s Session object. I don’t much like that though, it seems to me that it would have the bad points of both worlds (although to a lesser degree).

    Like

  2. It’s useful that you pointed out the types of context that currently exist and which are tied to a thread, such as the current culture or HTTP context.

    But, why does threading have to be a relevant point at all, especially if dealing with context that is specifically not tied to a thread, and dealing with threads that are specifically not tied to a specific task or context?

    It seems to me that in most cases, one would simply incorporate the context in an object that represents the work being done. The methods used to perform the task would exist in this object, or would at the very least have easy access to it, obviating any need to pass the context as arguments to each method, as well as any need to worry about which thread is being used to execute the work.

    Things like thread-local storage, to me, seem to be over-used. Sure in some specific limited scenarios where the data really is specific to a thread, it makes sense. But TLS is specifically counter-productive in worker-thread scenarios, and context is usually easily passed around via other means.

    I guess what I’m saying is, while I think I comprehend the two scenarios for passing context you describe — either as method arguments or kept in TLS — it’s not clear to me why either of those would be considered the primary mechanism for maintaining context. Is there some broader context (sorry for the pun) you’re invoking for the purpose of this question that narrows the options in this way?

    Like

  3. You make a good point. That said, a TaskLocal repository wouldn’t be hard to create in a user library with a simple Dictionary. We can perhaps call Task.ContinueWith(remove value from the TaskLocal) or something a bit smarter than that.

    And we can use extension methods to provide something similar to http Sessions which are explicitly meant for context:

    public static object GetContext(this Task task, object key);
    public static void SetContext(this Task task, object key, object value);

    Like

  4. @configurator: The tricky bit is propagating the context. You’re about to create a new task, and even the *creation* of the task will use some of that information.

    @Peter: Well, creating lots of different tasks is likely to involve lots of different components – which may not even know about each other. So each of those tasks will need to have the contextual information in “their” objects. Some things (e.g. which thread pool to use) can be configured within services on construction, but others really are “per request”.

    As with so many things, it would be interesting to work through a realistic “full on” example. I may try to devise such an example for future discussions of this and other intersting topics.

    Like

  5. I wondered if “explicity” was a word. It is, but I don’t think it’s the word you wanted. Then the spirit of adventure took me to “explicitude”, but that Googled very badly (NSFW). I’d stick with what you had…

    Like

  6. Hi, I’m the one in charge of Signum Framework and also a big fan of ThreadStatic.

    We use it to keep the CurrentConnection CurrentTransaction and some other things.

    I share the same concern. ThreadStatic are quite usefull, would be great that the framework exposes mehods for Saving/Loading all the TreadStatic variables in other thread. They are not that many and it will make async and parallel code simpler.

    Like

  7. There already is some support for this. You can use CallContext.LogicalSetData and CallContext.LogicalGetData to store and retrieve objects in the current call context, which will automatically flow through to things like new Tasks.

    It gets propagated using the ExecutionContext class. ExecutionContext.Capture will capture a context that includes things like the CallContext, SynchronizationContext, and SecurityContext. ExecutionContext.Run will invoke a delegate in that context. The thread pool and the WinForms and WPF message pumps use this, which means it works with Task without any special setup.

    Like

  8. @Quartermeister: That’s great news. It would be nice to wrap it up in a generic TaskLocal type, but that sounds easy enough.

    May play with that soonish :)

    Like

  9. Jon, thank you for giving voice to this issue. The challenge of understanding context is paramount for successful implementation of asynchrony. The greatest challenge I see here is that restructuring code to use TAP while preserving existing contextual behavior is going to be difficult unless MS provides a means to identify and “lift” contextual state.

    I’ve been working on rewriting a web-based poker engine to use TAP and I’ve run into numerous cases where thread-local state (ambient transactions, security, culture, CAS) have caused problems. In many cases it turns out to be difficult to figure out how to correctly restructure the code because of how prevalent (and deeply nested) the use of that contextual state was.

    A further complication is that when refactoring a process to use asynchrony, it’s possible to create situations where the state of one tasks “bleeds” into another. This can be hard to avoid. Thread-affinitized state requires a developer to have *global* knowledge of the execution flow of a program and be able to anticipate and predict where such state could intersect. I’ve found this to be far from easy.

    Like

  10. A lot can be said about a pure, functional approach where all of the inputs come through method parameters. Nonetheless, I agree that context is extremely useful in real applications, if only to bolt on features without reworking a prohibitive amount of existing code. This, of course, is basically why things like TLS and friends exist in programming environments.

    Quartermeister brought up CallContext. A while back I made a suggestion make CallContext more robust (https://connect.microsoft.com/VisualStudio/feedback/details/276325/provide-better-context-support-to-async-operations). What I got back was basically the answer that I feared:

    “Thank you for you feedback. Using CallContext for scenarios that do not involve remoting should be probably regarded as a design flaw. There are more suitable (and more efficient) ways how to pass data to another thread in the same domain. The ‘object’ parameter of BeginInvoke is definitely the preferred solution in your example. ”

    Jon, I hope that your post will kick off an increased interest to the importance of context and lead to the addition of robust context support in the .NET Framework.

    Like

  11. I think passing in “context” information of your own is the way to go.

    If a task 2 or more levels removed is the only one that needs the additional information/data then maybe something is wrong in the design of your tasks?

    Like

  12. I like to use ThreadStatic variables. Honestly, I never used ThreadLocal and don’t even know if it has any advantage over ThreadStatic.

    But, my biggest concern with ThreadStatic (and contexts, to that matter) is that ThreadPool threads keep all their ThreadStatic variables alive even after an Abort.

    I think it is really safe to use:
    _myTlsVariable = X;
    try
    {
    }
    finally
    {
    _myTlsVariable = null;
    }

    So, except for the abort, the threadstatic variable will never be lost. But, if the abort happends between _myTlsVariable = X; and try, such variable will be kept alive, because the Thread is reutilized.

    In my actual case, I never use the ThreadPool. I created a ThreadPool of my own only to solve that problem. If the Thread is finished the right way, it is returned to the pool. If the thread is aborted, it never returns to the pool, and so the ThreadStatic and contexts will eventually be collected.

    So, in my opinion, having a TaskLocal can be useful, but I think a good utilization of ThreadStatic can give the same benefits (specially if reutilizing a “context” for another task can improve performance). But the biggest problem is the fact that Aborted threads pool threads are revived, with every context in it.

    Like

Leave a comment