Diagnosing weird problems – a Stack Overflow case study

Earlier, I came across this Stack Overflow question. I solved it, tweeted it, but then thought it would serve as a useful case study into the mental processes I go through when trying to solve a problem – whether that’s on Stack Overflow, at work, or at home.

It’s definitely worth reading the original question, but the executive summary is:

When I compute the checksum/hash of c:WindowsSystem32Calc.exe using various tools and algorithms, those tools all give the same answer for each algorithm. When I try doing the same thing in Java, I get different results. What’s going on?

Now to start with, I’d like to shower a bit of praise on the author:

  • The post came with a short but utterly complete program to demonstrate the problem
  • The comments in the program showed the expected values and the actual values
  • The code was mostly pretty clean (clean enough for an SO post anyway)

In short, it had pretty much everything I ask for in a question. Yay! Additionally, the result seemed to be strange. The chances of any one of Java’s hashing algorithms being broken seem pretty slim, but four of them? Okay, now you’ve got me interested.

Reproducing the problem

Unless I can spot the error immediately, I usually try to reproduce the problem in a Stack Overflow post with a short but complete program. In this case, the program was already provided, so it was just a matter of copy/paste/compile/run. This one had the additional tweak that it was comparing the results of Java with the results of other tools, so I had to get hold of an MD5 sum tool first. (I chose to start with MD5 for no particular reason.) I happened to pick this one, but it didn’t really seem it would make much difference. (As it happens, that was an incorrect assumption, but hey…)

I ran md5sums on c:WindowsSystem32calc.exe, and got the same result as the poster. Handy.

I then ran the Java code, and again got the same result as the poster: step complete, we have a discrepancy between at least one tool (and MD5 isn’t that hard to get right) and Java.

Looking for obvious problems

The code has four main areas:

  • Reading a file
  • Updating digests
  • Converting bytes to hex
  • Storing and displaying results

Of these, all of the first three have common and fairly simple error modes. For example:

  • Failure to use the return value from InputStream.read()
  • Failure to update the digests using only the relevant portion of the data buffer
  • Failure to cope with Java’s signed bytes

The code for storing and displaying results seemed solid enough to ignore to start with, and brief inspection suggested that the first two failure modes had been avoided. While the hex code didn’t have any obvious problems either, it made sense to check it. I simply printed the result of hard-coding the “known good” CRC-32 value:

System.out.println(toHex(new byte[] {
    (byte) 0x8D, (byte) 0x8F, (byte) 0x5F, (byte) 0x8E
  }));

That produced the right result, so I ruled out that part of the code too. Even if it had errors in some cases, we know it’s capable of producing the right string for one of the values we know we should be returning, so it can’t be getting that value.

Initial checks around the file

I’m always suspicious of stream-handling code – or rather, I know how easily it can hide subtle bugs. Even though it looked okay, I thought I’d check – so I added a simple total to the code so I could see how many bytes had been hashed. I also removed all the hash algorithms other than MD5, just for simplicity:

MessageDigest md5 = MessageDigest.getInstance(“MD5”);

FileInputStream fis = new FileInputStream(file);
byte data[] = new byte[size];
int len = 0;
int total = 0;
while ((len = fis.read(data)) != -1) {
    md5.update(data, 0, len);
    total += len;
}
fis.close();
System.out.println(“Total bytes read: “ + total);

results.put(md5.getAlgorithm(), toHex(md5.digest()));

It’s worth noting that I haven’t tried to fix up bits of the code which I know I would change if I were actually doing a code review:

  • The stream isn’t being closed in a finally block, so we’ll have a dangling resource (until GC) if an IOException is thrown
  • The initial value of len is never read, and can be removed

Neither of these matters in terms of the problem at hand, and closing the file “properly” would make the sample code more complicated. (For the sake of just a short sample program, I’d be tempted to remove it entirely.)

The result showed the number of bytes being read as the command prompt did when I ran “dir c:WindowsSystem32Calc.exe” – so again, everything looks like it’s okay.

Desperate times call for stupid measures

Just on a whim, I decided to copy Calc.exe to a local folder (the current directory) and retry. After all, accessing a file in a system folder might have some odd notions applied to it. It’s hard to work out what, but… there’s nothing to lose just by trying a simple test. If it can rule anything out, and you’ve got no better ideas, you might as well try even the daftest idea.

I modified the source code to use the freshly-copied file, and it gave the same result. Hmm.

I then reran md5sums on the copied file… and it gave the same result as Java. In other words, running “md5sums c:WindowsSystem32Calc.exe” gave one result, but “md5sums CopyOfCalc.exe” gave a different result. At this point we’ve moved from Java looking like it’s behaving weirdly to md5sums looking suspicious.

Proving the root cause

At this point we’re sort of done – we’ve basically proved that the Java code produces the right hash for whatever data it’s given, but we’re left with the question of what’s happening on the file system. I had a hunch that it might be something to do with x86 vs x64 code (all of this was running on a 64-bit version of Windows 7) – so how do we test that assumption?

I don’t know if there’s a simple way of running an x86 version of the JVM, but I do know how to switch .NET code between x86 and x64 – you can do that for an assembly at build time. C# also makes the hashing and hex conversion simple, so I was able to knock up a very small app to show the problem:

using System;
using System.IO;
using System.Security.Cryptography;

class Test
{
    static void Main()
    {
        using (var md5 = MD5.Create())
        {
            string path = “c:/Windows/System32/Calc.exe”;
            var bytes = md5.ComputeHash(File.ReadAllBytes(path));
            Console.WriteLine(BitConverter.ToString(bytes));
        }
    }
}

(For a larger file I’d have used File.OpenRead instead, but then I’d have wanted to close the stream afterwards. Somehow it wasn’t worth correcting the existing possible handle leak in the Java code, but I didn’t want to write leaky code myself. So instead I’ve got code which reads the whole file into memory unnecessarily… )

You can choose the architecture to build against (usually AnyCPU, x86 or x64 – though it’s interesting to see that “arm” is also an option under .NET 4.5, for obvious reasons) either from Visual Studio or using the “/platform” command line option. This doesn’t change the IL emitted (as far as I’m aware) but it’s used for interop with native code – and in the case of executables, it also determines the version of the CLR which is used.

Building and running in x86 mode gave the same answer as the original “assumed to be good” tools; building and running in x64 mode gave the same answer as Java.

Explaining the root cause

At this point we’ve proved that the file system gives different results depending on whether you access it from a 64-bit process or a 32-bit process. The final piece of the puzzle was to find some something to explain why that happens. With all the evidence about what’s happening, it was now easy to search for more information, and I found this article giving satisfactory details. Basically, there are two different copies of the system executables on a 64 bit system: x86 ones which run under the 32-bit emulator, and x64 ones. They’re actually in different directories, but when a process opens a file in WindowsSystem32, the copy which matches the architecture of the process is used. It’s almost as if the WindowsSystem32 directory is a symlink which changes target depending on the current process.

A Stack Overflow comment on my answer gave a final nugget that this is called the “File System Redirector”.

Conclusion

Debugging sessions often feel a bit like this – particularly if you’re like me, and only resort to real debugging after unit testing has failed. It’s a matter of looking under all kinds of rocks, trying anything and everything, but keeping track of everything you try. At the end of process you should be able to explain every result you’ve seen, in an ideal world. (You may not be able to give evidence of the actual chain of events, but you should be able to construct a plausible chain of events which concurs with your theory.)

Be aware of areas which can often lead to problems, and check those first, gradually reducing the scope of “stuff you don’t understand” to a manageable level, until it disappears completely.

Eduasync 20: Changes between the VS11 Preview and the Visual Studio 11 Beta

A while I ago I blogged about what had changed under the hood of async between the CTP and the VS11 Preview. Well, now that the VS11 Beta is out, it’s time to do it all again…

Note that the code in this post is in the Eduasync codebase, under a different solution (Eduasync VS11.sln). Many of the old existing projects won’t compile with VS11 beta, but I’d rather leave them as they are for posterity, showing the evolution of the feature.

Stephen Toub has an excellent blog post covering some of this, so while I’ll mention things he’s covered, I won’t go into much detail about them. Let’s start off there though…

(EDIT: Stephen has also mailed me with some corrections, which I’ve edited in – mostly without indication, as the post has been up for less than seven hours, and it’ll make for a better reading experience.)

Awaiter pattern changes

The awaiter pattern is now not just a pattern. The IsCompleted property and GetResult method are still "loose" but OnCompleted is now part of an interface: INotifyCompletion. Awaiters have to implement INotifyCompleted, but may also implement ICriticalNotifyCompletion and its UnsafeOnCompleted method.

The OnCompleted method is just as it was before, and needs to flow the execuction context; the UnsafeOnCompleted method is simpler, as it doesn’t need to flow the execution context. All of this only matters if you’re implementing your own awaiters, of course. (More details in Stephen’s blog post. I’ve found this area somewhat confusing, so please do read his post carefully!)

Skeleton method changes

Just as I have previously, I’m using the (entirely unofficial) term "skeleton method" to mean the very short method created by the compiler with the same signature as an async method: this is the entry point to the async method, effectively, which creates and starts the state machine containing all the real logic.

There are two changes I’ve noticed in the skeleton method. Firstly, for some reason the state machine state numbers have changed. Whereas previously a state number of 0 meant "initial or running", positive values meant "between calls, or navigating back to the await expression" and -1 meant "finished", now -1 means "initial or running", non-negative means "between calls, or navigating back to await expression" and -2 means "finished". It’s not clear why this change has been made, given that it requires an extra assignment at the start of every skeleton method (to set the state to -1).

More importantly, the skeleton method no longer calls MoveNext directly on the state machine that it’s built. Instead, it calls Start<TStateMachine> on the AsyncTaskMethodBuilder<T> (or whichever method builder it’s using). It passes the state machine by reference (presumably for efficiency), and TStateMachine is constrained to implement the now-public-and-in-mscorlib IAsyncStateMachine interface. I’ll come back to the relationship between the state machine and the builder later on.

Task caching

(Code is in project 30: TaskCaching)

It’s possible for an async method to complete entirely synchronously. In this situation, the result is known before the method returns, and the task returned by the method is already in the RanToCompletionState. If two tasks have already run to completion with the same value, they can be (apparently) regarded as equivalent… so the beta now caches a task in this situation, for some types and values. (Apparently the preview cached too, but I hadn’t noticed, and the beta caches more.) According to my experiments and some comments:

  • For int, tasks with values -1 to 8 inclusive are cached
  • For bool, both values (true and false)
  • For char, byte, sbyte, short, ushort, uint, long, ulong, IntPtr and UIntPtr tasks with value 0 (or ”) are cached
  • For reference types, null is cached
  • For other types, no tasks are cached

EDIT: After they’ve completed, tasks are normally immutable except for disposal – the cached tasks are tweaked slightly to make disposal a no-op.

State machine interface changes

In the VS11 preview release, each state machine implemented an interface, but that interface was internal to the generated assembly, and contained a single method (SetMoveNextDelegate). It’s now a public interface with two methods:

Personally I’m not keen on the naming of "MoveNext" – I can’t help but feel that if we didn’t have the "naming baggage" of IEnumerator and the fact that at least early on, the code generator was very similar to that used for iterator blocks, we’d have something different. (It is moving to the next state of the state machine, but it still doesn’t quite feel right.) I’d favour something like "ContinueExecution". However, it doesn’t matter – it obviously does what you’d expect, and you’re not going to be calling this yourself.

SetStateMachine is a stranger beast. The documentation states:

Configures the state machine with a heap-allocated replica.

… which says almost nothing, really. The implementation is always simple, just like SetMoveNextDelegate was, although this time it delegates to the builder for the real work (a common theme, as we’ll see):

void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine param0)
{
    this.<>t__builder.SetStateMachine(param0);
}

Now AsyncTaskMethodBuilder.SetStateMachine is also documented pretty sparsely:

Associates the builder with the specified state machine.

Again, no real help. However, we’ll see that it’s the builder which is responsible for calling OnContinue now, and as it can call MoveNext on an IStateMachine, it makes sense to tell it which state machine it’s associated with… but can’t it do that directly?

Well, not quite. The problem (as I understand it) is around when boxing occurs. We initially create the state machine on the stack, and it contains the builder. (Both are structs.) That’s fine until we need a continuation, but then we’ve got to be able to get back to the current state later, after the current stack frame has been blown away. So we need to box the state machine. That will create a copy of the current builder (within the box). We need the builder within the boxed state machine to contain a reference to the same box. So the order has to be:

  • Box the state machine
  • Tell the state machine about the boxed reference
  • The state machine tells its nested builder about the boxed reference

Back when the state machine was in charge of the boxing, this went via the delegate: the act of creating the box was implicit when creating the delegate, and then casting the delegate target to the interface type allowed a reference to the newly-created delegate to be set within the copy. This is similar, but using the builder instead. It’s hard to follow, but of course it’s not going to matter.

State machine field changes

There are various kinds of fields in the state machine:

  • Those corresponding with local variables and parameters in the async method
  • The state
  • The field(s) associated with awaiters
  • (In the preview/beta) The field associated with the "current execution stack" at the point of an await expression
  • (In the CTP) An unused "disposing" field

Of these, I believe only the awaiters have actually changed, but before we talk about that, let’s revisit local variables.

Local variable hoisting

I’ve just noticed that the local variables are only hoisted to fields when its scope contains an await expressions, but in that case all local variables of that scope are hoisted, whether or not they’re used "across" awaits. It would be possible to hoist only those which need to be maintained between executions, but then you wouldn’t be able to see the others when debugging, which would be somewhat confusing. Likewise local variables of the same type which are never propagated across the same await could be aliased. For example, consider this async method:

static async Task<int> M(Random rng)
{
    int x = rng.Next(1000);
    int y = x + rng.Next(1000);
        
    await Task.Yield();
        
    int z = y + rng.Next(1000);
        
    await Task.Yield();
    return z;
}

If the compiler could be confident you didn’t need to debug through this code, it could make do with one field of type "Random" and one field of type "int" – x can be a completely local variable in MoveNext (it’s not used between two awaits) and y and z can be aliased (we never need the value of y after we’ve first written to z).

Local variable aliasing probably isn’t particularly useful for "normal" methods as the JIT may be able to do it (so long as you don’t have a debugger attached, potentially) but in this case we expect the state machine to be boxed at some point, so potentially does make a difference (while the stack is typically reasonably small, you could have a lot of outstanding async methods in some scenarios). Maybe in a future release, the C# compiler could have an aggressive optimization mode around this, to be turned on explicitly. (I don’t think it should be  a particularly high priority, mind you.)

Awaiter fields

(Code is in project 31, AwaiterFields.)

Awaiter fields have changed a bit over the course of async’s history.

In the CTPs (all of them, I believe) each await expression had its own awaiter field in the state machine, with a type corresponding to the declared awaiter type from the awaitable. (Remember that the awaitable is the thing you await, such as a task, and the awaiter is what you get back from calling GetAwaiter on the awaitable).

In the VS11 Preview, there was always a single awaiter field of type object. From what I saw, it was usually populated with a single-element array containing an awaiter. For value type awaiters (i.e. where the awaiter is a struct) this is somewhat similar to boxing, but maintaining strong typing, so calls to IsCompleted etc can still be made. It’s possible that reference type awaiters were stored without the array creation, as it would serve no purpose. (I don’t have any machines with just the preview installed to verify this.)

In the Beta, we have a mixture. If there are any reference type awaiters, they all end up being stored in a single field of type object, which is then cast back to the actual type when required. (Don’t forget that only one awaiter can be "active" at a time, which makes this possible.) This includes awaiters of an interface type – it’s only the compile-time type declared as the return type of the GetAwaiter method of the awaitable which is important.

If any of the awaiter types are value types, each of these types gets its own field. So there might be a TaskAwaiter<int> field and a TaskAwaiter<string> field, for example. However, there can still be "sharing" going on: if there are multiple await expressions all of the same value type awaiter, they will all share a single field. (This all feels a little like the JITting of generics, but it’s somewhat coincidental.)

MoveNext method changes

(Code is in project 32, BetaStateMachine)

As I’ve mentioned earlier, the builder is now responsible for a lot more of the work than it was in earlier versions. The majority of the code remains the same as far as I can tell, in terms of handling branching, evaluating expressions with multiple await expressions and so on.  The code in the source repository shows what the complete state machine looks like, but for the sake of clarity, I’ll just focus on a single snippet. If we have an await expression like this:

await x;

then the state machine code in the VS11 Preview would look something like this:

localTaskAwaiter = x.GetAwaiter();
if (localTaskAwaiter.IsCompleted)
{
    goto AwaitCompleted;
}
this.state = 1;
TaskAwaiter[] awaiterArray = { localTaskAwaiter };
this.awaiter = awaiterArray;
Action continuation = this.MoveNextDelegate;
if (continuation == null)
{
    Task<int> task = this.builder.Task;
    continuation = MoveNext;
    ((IStateMachine) continuation.Target).SetMoveNextDelegate(continuation);
}
awaiterArray[0].OnCompleted(continuation);

return;

(That’s just setting up the await, of course – there’s then the bit where the result is fetched, but that’s less interesting. There’s also the matter of doFinallyBodies.)

In the VS11 Beta, it’s something this instead (for an awaiter type "AwaiterType" which implements ICriticalNotifyCompletion, in a state machine of type ThisStateMachine).

localAwaiter = x.GetAwaiter();
if (!localAwaiter.IsCompleted)
{
    state = 0;
    awaiterField = localAwaiter;
    builder.AwaitUnsafeOnCompleted<AwaiterType, ThisStateMachine>(ref localAwaiter, ref this);
    doFinallyBodies = false;
    return;
}

If the awaiter type only implements INotifyCompletion, it calls AwaitOnCompleted instead. Note how the calls are generic (but both type variables are constrained to implement appropriate interfaces) which avoids boxing.

The call to the builder will call back to the state machine’s SetStateMachine method if this is the first awaiter that hasn’t already completed within this execution of the async method. So that handles the section which checked for the continuation being null in the first block of code. Most of the rest of the change is explained by the difference in awaiter types, and obviously AwaitOnCompleted/AwaitUnsafeOnCompleted also ends up calling into OnCompleted on the awaiter itself.

Mutable value type awaiters

(Code is in project 32, MutableAwaiters)

One subtle difference which really shouldn’t hurt people but is fun to explore is what happens if you have an awaiter which is a mutable value type. Due to the way awaiters were carefully handled pre-beta, mutations which were conducted as part of OnCompleted would still be visible in GetResult. That’s not the case in the beta (as Stephen mentions in his blog post). Mind you, it doesn’t mean that all mutations will be ignored… just ones in OnCompleted. A mutation from IsCompleted is still visible, as shown here:

public struct MutableAwaiter : INotifyCompletion
{
     private string message;

     public MutableAwaiter(string message)
     {
         this.message = message;
     }

     public bool IsCompleted
     {
         get
         { 
             message = "Set in IsCompleted";
             return false;
         }
     }

     public void OnCompleted(Action action)
     {
         message = "Set in OnCompleted";
         // Ick! Completes inline. Never mind, it’s only a demo…
         action();
     }

     public string GetResult()
     {
         return message;
     }
}

What would you expect to be returned from this awaiter? You can verify that all three members are called… but "Set in IsCompleted" is returned. That’s because IsCompleted is called before the awaiter value is copied into a field within the state machine. Even though the state machine passes the awaiter by reference, it’s passing the local variable, which is of course a separate variable from the field.

I’m absolutely not suggesting that you should rely on any of this behaviour. If you really need to be able to mutate your awaiter, make it a reference type.

Conclusion

The main changes in the Beta are around the interactions between the AsyncTaskMethodBuilder (et al) and the state machine, including the new interfaces for awaiters. There’s been quite a bit of optimization, although I still see room for a bit more:

  • When there’s only a single kind of reference type awaiter, the field for storing it could be of that type rather than of type object, removing the need for an execution-time cast
  • The "stack" variable could be removed in some cases, and made into a specific type in many others
  • With appropriate optimization flags, local variables which aren’t used await expressions could stay local to the state machine instead of being hoisted, and hoisted variables could be aliased in some cases.

One thing which concerns me slightly is how the C# language specification is going to change – the addition of the new interfaces is definitely going to mean more complexity from this previously "tidy" feature. I’m sure it’s worth it for the sake of efficiency and the like, but part of me sighs at every added tweak.

So, is this now close to the finished version of async? Only time will tell. I haven’t checked whether dynamic awaitables have finally been introduced… if they have, I’ll put that in the next post.