Non-volatile reads and Interlocked, and how they interact

Recently (May 2007) there’s been a debate on the microsoft.public.dotnet.framework newsgroup about the memory model, non-volatile variables, the Interlocked class, and how they all interact. Consider the following program:

Update! I screwed up the code, making all of the combinations possible accidentally. The new code is now in the post – the first few comments are based on the original code.

using System;
using System.Threading;

class Program
{
    int x;
    int y;
    
    void Run()
    {
        ThreadStart increment = IncrementVariables;
        new Thread(increment).Start();
        
        int a = x;
        int b = y;
        
        Console.WriteLine ("a={0}, b={1}", a, b);
    }
    
    void IncrementVariables()
    {
        Interlocked.Increment(ref y);
        Interlocked.Increment(ref x);
    }
    
    static void Main()
    {
        new Program().Run();
    }
}

The basic idea is that two variables are read at “roughly the same time” as they’re being incremented on another thread. The increments are each performed using Interlocked.Increment, which introduces a memory barrier – but the variables are read directly, and they’re not volatile. The question is what the program can legitimately print. I’ve put the reads into completely separate statements so that it’s crystal clear what order the IL will put them in. That unfortunately introduces two extra variables, a and b – think of them as “the value of x that I read” and “the value of y that I read” respectively.

Let’s consider the obvious possible values first:

a=0, b=0

This is very straightforward – the variables are read before the incrementing thread has got going.

a=0, b=1

This time, we read the value of x (and copied the value into a) before the incrementing thread did anything, then the values were incremented, and then we read the value of y.

a=1, b=1

This time, the incrementing thread does all its work before we get round to reading either of the variables.

So far, so good. The last possibility is the tricky one:

a=1, b=0

This would, on first sight, appear to be impossible. We increment y before we increment x, and we read x before we read y – don’t we? That should prevent this situation.

My contention is that there’s nothing to prevent the JIT from reordering the reads of x and y, effectively turning the middle bit of the code into this:

using System;
int b = y;
int a = x;
        
Console.WriteLine ("a={0}, b={1}", a, b);

Now that code could obviously show “a=1, b=0” by reading y before the increments took place and x afterwards.

The suggestion in the discussion was that the CLR had to honour the interlocked contract by effectively treating all access to these variables as volatile, because they’d been used elsewhere in an Interlocked call. I maintain that’s not only counter-intuitive, but would also require (in the case of public variables) all assemblies which might possibly use Interlocked with the variables to be scanned, which seems infeasible to me.

So, what do you all think? I’ll be mailing Joe Duffy to see if he can give his somewhat more expert opinion…

8 thoughts on “Non-volatile reads and Interlocked, and how they interact”

  1. It’s my belief (and I hope Joe responds to your email, so I’ll /know/!) that the reads are reordered, and the results become non-deterministic.

    In similar code now, I either use a memory fence (usually provided by Monitor.Enter) or use:
    int b = Interlocked.InterlockedCompareExchange, ref y, int.MinValue, int.MinValue);

    … as this also gives the memory fence, and everything would work as intended.

    Going the other route, making x&y volatile, also seems to have drawbacks.


    Chris

    Like

  2. Isn’t it so that the order of the execution on a single core proc, single proc machine could be:
    – increment x
    – read x into a, a is now 1
    – read y into b, b is now 0
    – increment y

    simply because the 2 increment statements don’t form an atomic action, it could be that in between the two increment statements, the proc access goes back to the main routine.

    Like

  3. “a=1, b=0

    This would, on first sight, appear to be impossible. We increment x before we increment y, and we read x before we read y – don’t we? That should prevent this situation. ”

    How so? If the incrementing thread has completed incrementing a but has not yet got around to incrementing b, and the main thread now reads the values of x and y, wouldn’t they be 1 and 0?

    Thread 1: Interlocked.Increment(ref x);
    Thread 0: int a = x;
    Thread 0: int b = y;
    Thread 0: Console.WriteLine(“a={0}, b={1}”, a, b); // Prints 1 and 0
    Thread 1: Interlocked.Increment(ref y);

    Or did you mean to increment y before increment ing x?

    Like

  4. Senthil, Frans – well spotted. Sorry for mucking things up. I’ve fixed up the code now so it increments y before x, which makes much more sense in terms of it being an issue :)

    Jon

    Like

  5. Hi Jon. I believe it was me you had the ng discussion with. I read this in Ecma:

    “Ecma-335:2006

    12.6.5

    5. Explicit atomic operations… These operations (e.g. Increment, Decrement, Exchange, and CompareExchange) perform implicit acquire/release operations. …”

    So by my read, this contract would require the CLI to respect the same optimization rules as volatile as far as ordering. System.Thread.MemoryBarrier would seem to have to respect proper order too even if the vars are not decorated with volatile. Do you read it different? Thanks.


    William Stacey [C# MVP]

    Like

  6. (Same comment as posted on the newsgroup)

    The operations on Interlocked *do* perform implicit acquire/release
    operations – but the *reading* operations don’t. That’s the problem.

    To put it an alternative way, it is (in some ways – not all) a bit like
    acquiring a lock while writing a value, but not acquiring the same lock
    when reading. Yes, you’ve made sure that the data is available to be
    read with the memory barrier on the writing thread, but you haven’t
    made sure that the reading thread actually performs the read when you
    expect it to.

    Basically, both threads need a memory barrier in order to be effective.
    Using Interlocked puts a memory barrier on the calling thread, but
    that’s all – IMO :)

    Like

  7. Jon, sorry it’s taken me so long to respond.

    Your suspicion is correct: a==1, b==0 is indeed a valid outcome given the CLR memory model. It is possible that the JIT will reorder the reads, in which case they physically issue and execute out of order; moreover, on weaker memory models like Intel’s IA64 the default load semantics are non-acquire, meaning even loads issued in order can retire out of order.

    There are many solutions. The simplest is to mark ‘x’ or ‘y’ volatile, guaranteeing the JIT won’t reorder and that we emit acquire loads on IA64. Though technically this could have a minor perf hit elsewhere (e.g. on reads where you might not need the reordering safety), this would be my recommendation: it declares to people reading the code, “beware! there’s some lock free stuff in here.” Alternatively, you could use Thread.VolatileRead to read ‘x’ or ‘y’, though the overhead of calling this function is more than marking the variables as volatile. Lastly, you can issue a memory barrier between the reads; as your readers already noted, Interlocked is one way of doing this; Thread.MemoryBarrier in theory can be more efficient on platforms with native memory fence instructions (though our JIT doesn’t currently have such intrinsics so that’s just theoretical). Note that a full barrier is much stronger than necessary — it implies flushing the write buffer, etc., and costs a whole lot more than just marking the variable as volatile. Fwiw.

    —joe

    Like

  8. Although I am primarily concerned at this instant with unmanged code, I was interested to see this discussion.

    My own research has made it pretty clear that there is a lack of clarity on this whole subject and to some extent MIcrosoft’s own published documentation isn’t as good as it needs to be.

    For example several articles/blogs mention that VS 2005 implements ‘volatile’ differently than it did in VS 2003. It seems that VS 2005 goes beyond the normal steps (of inhibiting optimizations) and also forces/uses acquire/release semantics when accessing volatile values.

    Yet why is this not even hinted at in the pages that actually document the ‘volatile’ keyword? Even the versions for ‘Orcas’?

    There are both hardware and software aspects of these multi-threading pitfalls, and they need to be explained and understood separately.

    Most of us understand the need for ‘locking’ and once familar with it, it is pretty straightforward. But far fewer understand the real-world behaviour of modern CPU caches, especially when one is dealing with > 1 processor.

    The reordering issue you discuss, is of course quite a real possibility, but it needs to be pointed out that reordering can occur as a result of

    a) Compiler/Optimizer and
    b) CPU cache behavior

    Persoanlly I dont think it helps to use the terms ‘fence’ and ‘barrier’ when dealing with both of these, I’d rather see these terms limited to describing hardware and use some other terms for software/optimizer.

    As it stands we have _ReadBarrier, _WriteBarrier, _ReadWriteBarrier and MemoryBarrier, yet only the latter deals with TRUE hardware fence operations, is it any wonder that even experienced engineers are sometimes confused!

    Like

Leave a comment