Category Archives: Diagnostics

Diagnosing Portable Class Libraries

I’ve always found Portable Class Library (PCL) configuration to be a bit of a mystery. In simple cases, it’s simple: start a new PCL project in Visual Studio, select the environments you want to support, and away you go. But what’s going on under the hood, and what do all the options mean? How do I do this without Visual Studio?

Background: supporting Universal Windows Applications in Noda Time

Recently, I had a feature request for Noda Time to support Windows Phone 8.1 and Universal Windows Applications. This is something I haven’t done before, but it doesn’t sound too hard – it’s just a matter of changing an option in Visual Studio, surely… Unfortunately it’s not that simple for Noda Time. In order to support .NET 3.5 (and some other features which aren’t available in PCLs) the core NodaTime.dll project has multiple configurations which effectively split into “Desktop” and “PCL”. This has been achieved by hacking around with the .csproj file manually – nothing particularly clever, but the sort of thing which can’t be done directly in Visual Studio. This means that when I open the project properties in Visual Studio – even when I’m building a PCL version – it treats the project as a regular class library. (It confuses the heck out of ReSharper too in some cases, unfortunately.)

That’s not too hard to work around – I just need to know the name of the profile I want to support. A profile name is basically a single identifier which refers to the configuration of environments you want to support. Unfortunately, it’s pretty opaque – Noda Time 1.2 ships with “Profile2” which means “.NET 4.0, Windows 8 (store apps), Silverlight 4, Windows Phone 7. The simplest way to work out what profile I wanted was to create a new “regular” PCL in Visual Studio, unload the project, manually set the profile to Profile2 in the project file, reload the project, and then enable “Windows Phone 8.1”.

That then led to the next problem: Visual Studio 2013 (update 2) wanted to upgrade that sample project… because Profile2 is no longer really an option. You can’t manually set that combination in Visual Studio any more – it doesn’t display anything earlier than Silverlight 5 and Windows Phone 8. I now have two problems:

  • Given that Visual Studio changes the project file for me with a warning when I open it, how can I check what I’ve actually built against? Does msbuild silently do the same thing?
  • How can I find out what profiles are available, in order to select the most appropriate one?

Time to do some research…

Detecting a profile in a built assembly

First things first: if someone gives you a DLL, how can you detect what profile it was built against, and therefore which environments it can run in?

After hunting around in vain for a while, Petr K helped me out on Twitter: the crucial information is in the TargetFrameworkAttribute and its FrameworkName property, which is something like “.NETPortable,Version=v4.0,Profile=Profile2”. That can then be used with the FrameworkName class to parse out the profile name without writing any string manipulation code.

Then you just need to find the details of the profile…

Where are all the profiles?

Windows stores the profiles along with the reference assembles, in the .NETPortable framework directory – on my machine for example, that’s “C:Program Files (x86)Reference AssembliesMicrosoftFramework.NETPortable”. There are then subdirectories v4.0, v4.5 and v4.6 – although it’s not 100% clear what those refer to. Within each of those, there’s a Profile directory, which contains a bunch of Profile[n] directories. Then under each Profile[n] directory, a SupportedFrameworks directory contains XML documents describing the supported frameworks for that profile. Phew! So as an example:

C:Program Files (x86)Reference AssembliesMicrosoftFramework.NETPortablev4.0ProfileProfile2

contains the profile information for Profile2, including files:

  • C:Program Files (x86)Reference AssembliesMicrosoftFramework.NETPortablev4.0ProfileProfile2SupportedFrameworks.NET Framework 4.xml
  • C:Program Files (x86)Reference AssembliesMicrosoftFramework.NETPortablev4.0ProfileProfile2SupportedFrameworksSilverlight.xml
  • C:Program Files (x86)Reference AssembliesMicrosoftFramework.NETPortablev4.0ProfileProfile2SupportedFrameworksWindows 8.xml
  • C:Program Files (x86)Reference AssembliesMicrosoftFramework.NETPortablev4.0ProfileProfile2SupportedFrameworksWindows Phone Silverlight 7.xml

Each of those XML files consists of a single Framework element, with a variety of attributes. Not all elements have the same set of attributes, although everything has an Identifier, Family, Profile, MinimumVersion and DisplayName as far as I’ve seen. Here’s an example – the “Windows Phone Silverlight 7.xml” file listed above:


<Framework
    Identifier=”Silverlight”
    Profile=”WindowsPhone*”
    MinimumVersion=”4.0″
    Family=”WindowsPhone”
    MinimumVisualStudioVersion=”10.0″
    MaximumVisualStudioVersion=”11.0″
    DisplayName=”Windows Phone Silverlight”
    MinimumVersionDisplayName=”7″
    PlatformArchitectures=”AnyCPU” />

As noted by David Kean in comments, the “Identifier”, “Profile” and “MinimumVersion” are mapped to the corresponding properties in MSBuild; the combination of the three is unique for a target, but you need to be careful to look at all the relevant information… don’t just look at the Identifier attribute. Additionally, if there’s a MinimumVersionDisplayName, that’s probably the version you’ll be more familiar with – so in the example above, the displayed minimum version number is 7, which is what people think about for Windows Phone versions, even though the real minimum version number is 4.0 as far as the build is concerned.

Read David’s comment for more details – I don’t want to paraphrase it too much, in case I accidentally write something misleading.

What about the Xamarin frameworks?

As you can see above, Profile2 doesn’t include anything that looks like it’s a Xamarin framework. (Compare that with Profile136 for example, which includes Xamarin.iOS.xml and Xamarin.Android.xml.) But haven’t I previously said that Noda Time 1.2 works seamlessly in Xamarin.iOS and Xamarin.Android? Indeed I have. This is all very confusing, but as far as I’m aware, the Xamarin.* frameworks are only required if you want to build within Xamarin Studio and you’re not on a Windows machine. I could be wrong about that, but certainly there are some profiles which will work with some versions of Xamarin without a Xamarin-specific framework being on the list.

I would welcome more clarity on this, and if I learn any more about it (either in comments here or via other channels) I’ll update this post.

Where should the files go in NuGet packages?

The next issue to consider is that of NuGet packages. A NuGet package can contain multiple binaries for multiple target profiles – for example, the NodaTime NuGet package includes both the desktop and PCL versions. NuGet adds a reference to the right one when you install a package. How can it tell which binary is which? Through a directory structure. Here’s a piece of Noda Time’s nuspec package:

  <file src=“binSigned ReleaseNodaTime.dll” target=“libnet35-Client” />
  <file src=“binSigned ReleaseNodaTime.pdb” target=“libnet35-Client” />
  <file src=“binSigned ReleaseNodaTime.xml” target=“libnet35-Client” />
  <file src=“binSigned Release PortableNodaTime.dll” target=“libportable-win+net40+sl4+wp7” />
  <file src=“binSigned Release PortableNodaTime.pdb” target=“libportable-win+net40+sl4+wp7” />
  <file src=“binSigned Release PortableNodaTime.xml” target=“libportable-win+net40+sl4+wp7” />
  <file src=“***.cs” target=“src” />

As you can see, the PCL files end up in libportable-win+net40+sl4+wp7. That works for Profile2, but it’s not the right directory for the profile I’ll end up using. So how do I work out what directory to use in the next version? I can look at the NuGet source code, or this handy table of profiles and corresponding NuGet targets. (Interestingly, that lists the directory for Profile2 as portable-net4+sl4+netcore45+wp7, so I’m not entirely sure what’s going on. I suspect the table is more reliable than whenever I got the name from when preparing a Noda Time release.)

This all sounds a bit fiddly. Shouldn’t there be a tool to make it simple?

Yes, yes there should. As it happens, Stephen Cleary has already written one, although I didn’t realise this until I’d also written one myself. Indeed, the table linked above is the output from Stephen’s tool.

My own tool is called PclPal, and the source is available on GitHub. Don’t expect polished code (or tests) – this was very much a quick and dirty hack to make life slightly easier. It could be the basis of a more polished tool – possibly even a “PCL profile explorer” or something similar. Oh, and it requires the Roslyn End User Preview.

Currently it’s just a command line tool with three options:

  • “list” – lists the profiles on your system
  • “dll ” – shows the details of the profile a particular assembly was built against
  • “show ” – shows the details of a specify profile

Note that currently I’m not doing anything in terms of NuGet package directories, although that’s probably the next step, if I take this any further. (I may just use Stephen’s tool, of course…)

Conclusion

The world of PCLs is a bit of a mess, to say the least. It’s possible that there’s lots of really clear, up-to-date documentation explaining everything somewhere – but I have yet to find it. I suspect the main problem is that this is a moving target in all kinds of ways. Versioning is never easy, and between all the different variations possible here, I’m not entirely surprised it’s confusing.

Hopefully this post will help to shed a bit more light on things – please feel very free to leave corrections, as I definitely don’t want to propagate any falsehoods. I’ve definitely learned a thing or two in the last few days, and with tools such as PclPal and Stephen’s PortableLibraryProfiles, life should be that little bit easier…

Diagnosing issues with reversible data transformations

I see a lot of problems which look somewhat different at first glance, but all have the same cause:

  • Text is losing “special characters” when I transfer it from one computer to another
  • Decryption ends up with garbage
  • Compressed data can’t be decompressed
  • I can transfer text but not binary data

These are all cases of transforming and (usually) transferring data, and then performing the reverse transformation. Often there are multiple transformations involved, and they need to be carefully reversed in the appropriate order. For example:

  1. Convert text to binary using UTF-8
  2. Compress
  3. Encrypt
  4. Base64-encode
  5. Transfer (e.g. as text in XML)
  6. Base64-decode
  7. Decrypt
  8. Decompress
  9. Convert binary back to text using UTF-8

The actual details of each question can be different, but the way I’d diagnose them is the same in each case. That’s what this post is about – partly so that I can just link to it when such questions arise. Although I’ve numbered the broad steps, it’s one of those constant iteration situations – you may well need to tweak the logging before you can usefully reduce the problem, and so on.

1. Reduce the problem as far as possible

This is just my normal advice for almost any problem, but it’s particularly relevant in this kind of question.

  • Start by assembling a complete program demonstrating nothing but the transformations. Using a single program which goes in both directions is simpler than producing two programs, one in each direction.
  • Remove pairs of transformations (e.g. encrypt/decrypt) at a time, until you’ve got the minimal set which demonstrates the problem
  • Avoid file IO if possible: hard-code short sample data which demonstrates the problem, and use in-memory streams (ByteArrayInputStream/ByteArrayOutputStream in Java; MemoryStream in .NET) for temporary results
  • If you’re performing encryption, hard-code a dummy key or generate it as part of the program.
  • Remove irrelevant 3rd party dependencies if possible (it’s simpler to reproduce an issue if I don’t need to download other libraries first)
  • Include enough logging (just console output, usually) to make it obvious where the discrepancy lies

In my experience, this is often enough to help you fix the problem for yourself – but if you don’t, you’ll be in a much better position for others to help you.

2. Make sure you’re really diagnosing the right data

It’s quite easy to get confused when comparing expected and actual (or before and after) data… particularly strings:

  • Character encoding issues can sometimes be hidden by spurious control characters being introduced invisibly
  • Fonts that can’t display all characters can make it hard to see the real data
  • Debuggers sometimes “helpfully” escape data for you
  • Variable-width fonts can make whitespace differences hard to spot

For diagnostic purposes, I find it useful to be able to log the raw UTF-16 code units which make up a string in both .NET and Java. For example, in .NET:

static void LogUtf16(string input)
{
    // Replace Console.WriteLine with your logging approach
    Console.WriteLine(“Length: {0}”, input.Length);
    foreach (char c in input)
    {
        Console.WriteLine(“U+{0:x4}: {1}”, (uint) c, c);
    }
}

Binary data has different issues, mostly in terms of displaying it in some form to start with. Our diagnosis tools are primarily textual, so you’ll need to perform some kind of conversion to a string representation in order to see it at all. If you’re really trying to diagnose this as binary data (so you’re interested in the raw bytes) do not treat it as encoded text using UTF-8 or something similar. Hex is probably the simplest representation that allows differences to be pinpointed pretty simply. Again, logging the length of the data in question is a good first step.

In both cases you might want to include a hash in your diagnostics. It doesn’t need to be a cryptographically secure hash in any way, shape or form. Any form of hash that is likely to change if the data changes is a good start. Just make sure you can trust your hashing code! (Every piece of code you write should be considered suspect – including whatever you decide to use for converting binary to hex, for example. Use trusted third parties and APIs provided with your target platform where possible. Even though adding an extra dependency for diagnostics makes it slightly harder for others to reproduce the problem, it’s better than the diagnostics themselves being suspect.)

3. Analyze a clean, full, end-to-end log

This can be tricky when you’ve got multiple systems and platforms (which is why if you can possibly reproduce it in a single program it makes life simpler) but it’s really important to look at one log for a complete run.

Make sure you’re talking about the same data end-to-end. If you’re analyzing live traffic (which should be rare; unless the problem is very intermittent, this should all be done in test environments or a developer machine) or you have a shared test environment you need to be careful that you don’t use part of the data from one test and part of the data from another test. I know this sounds trivial, but it’s a really easy mistake to make. In particular, don’t assume that the data you’ll get from one part of the process will be the same run-to-run. In many cases it should be, but if the overall system isn’t working, then you already know that one of your expectations is invalid.

Compare “supposed to be equal” parts of the data. As per the steps in the introduction, there should be pairs of equal data, moving from the “top and bottom” of the transformation chain towards the middle. Initially, you shouldn’t care about whether you view the transformation as being correct – you’re only worried about whether the output is equal to the input. If you’ve managed to preserve all the data, the function of the transformation (encryption, compression etc) becomes relevant – but if you’re losing data, anything else is secondary. This is where the hash from the bottom of step 2 is relevant: you want to be able to determine whether the data is probably right as quickly as possible. Between “length” and “hash”, you should have at least some confidence, which will let you get to the most likely problem as quickly as possible.

4. Profit! (Conclusion…)

Once you’ve compared the results at each step, you should get an idea of which transformations are working and which aren’t. This may allow you to reduce the problem further, until you’ve just got a single transformation to diagnose. At that point, the problem becomes about encryption, or about compression, or about text encoding.

Depending on the situation you’re in, at this point you may be able to try multiple implementations or potentially multiple platforms to work out what’s wrong: for example, if you’re producing a zip file and then trying to decompress it, you might want to try using a regular decompression program to open your intermediate results, or decompress the results of compressing with a standard compression tool. Or if you’re trying to encrypt on Java and decrypt in C#, implement the other parts in each platform, so you can at least try to get a working “in-platform” solution – that may well be enough to find out which half has the problem.

To some extent all this blog post is about is reducing the problem as far as possible, with some ideas of how to do that. I haven’t tried to warn you much about the problems you can run into in any particular domain, but you should definitely read Marc Gravell’s excellent post on “How many ways can you mess up IO?” and my earlier post on understanding the meaning of your data is pretty relevant too.

As this is a “hints and tips” sort of post, I’ll happily modify it to include reader contributions from comments. With any luck it’ll be a useful resource for multiple Stack Overflow questions in the months and years to come…

Diagnosing weird problems – a Stack Overflow case study

Earlier, I came across this Stack Overflow question. I solved it, tweeted it, but then thought it would serve as a useful case study into the mental processes I go through when trying to solve a problem – whether that’s on Stack Overflow, at work, or at home.

It’s definitely worth reading the original question, but the executive summary is:

When I compute the checksum/hash of c:WindowsSystem32Calc.exe using various tools and algorithms, those tools all give the same answer for each algorithm. When I try doing the same thing in Java, I get different results. What’s going on?

Now to start with, I’d like to shower a bit of praise on the author:

  • The post came with a short but utterly complete program to demonstrate the problem
  • The comments in the program showed the expected values and the actual values
  • The code was mostly pretty clean (clean enough for an SO post anyway)

In short, it had pretty much everything I ask for in a question. Yay! Additionally, the result seemed to be strange. The chances of any one of Java’s hashing algorithms being broken seem pretty slim, but four of them? Okay, now you’ve got me interested.

Reproducing the problem

Unless I can spot the error immediately, I usually try to reproduce the problem in a Stack Overflow post with a short but complete program. In this case, the program was already provided, so it was just a matter of copy/paste/compile/run. This one had the additional tweak that it was comparing the results of Java with the results of other tools, so I had to get hold of an MD5 sum tool first. (I chose to start with MD5 for no particular reason.) I happened to pick this one, but it didn’t really seem it would make much difference. (As it happens, that was an incorrect assumption, but hey…)

I ran md5sums on c:WindowsSystem32calc.exe, and got the same result as the poster. Handy.

I then ran the Java code, and again got the same result as the poster: step complete, we have a discrepancy between at least one tool (and MD5 isn’t that hard to get right) and Java.

Looking for obvious problems

The code has four main areas:

  • Reading a file
  • Updating digests
  • Converting bytes to hex
  • Storing and displaying results

Of these, all of the first three have common and fairly simple error modes. For example:

  • Failure to use the return value from InputStream.read()
  • Failure to update the digests using only the relevant portion of the data buffer
  • Failure to cope with Java’s signed bytes

The code for storing and displaying results seemed solid enough to ignore to start with, and brief inspection suggested that the first two failure modes had been avoided. While the hex code didn’t have any obvious problems either, it made sense to check it. I simply printed the result of hard-coding the “known good” CRC-32 value:

System.out.println(toHex(new byte[] {
    (byte) 0x8D, (byte) 0x8F, (byte) 0x5F, (byte) 0x8E
  }));

That produced the right result, so I ruled out that part of the code too. Even if it had errors in some cases, we know it’s capable of producing the right string for one of the values we know we should be returning, so it can’t be getting that value.

Initial checks around the file

I’m always suspicious of stream-handling code – or rather, I know how easily it can hide subtle bugs. Even though it looked okay, I thought I’d check – so I added a simple total to the code so I could see how many bytes had been hashed. I also removed all the hash algorithms other than MD5, just for simplicity:

MessageDigest md5 = MessageDigest.getInstance(“MD5”);

FileInputStream fis = new FileInputStream(file);
byte data[] = new byte[size];
int len = 0;
int total = 0;
while ((len = fis.read(data)) != -1) {
    md5.update(data, 0, len);
    total += len;
}
fis.close();
System.out.println(“Total bytes read: “ + total);

results.put(md5.getAlgorithm(), toHex(md5.digest()));

It’s worth noting that I haven’t tried to fix up bits of the code which I know I would change if I were actually doing a code review:

  • The stream isn’t being closed in a finally block, so we’ll have a dangling resource (until GC) if an IOException is thrown
  • The initial value of len is never read, and can be removed

Neither of these matters in terms of the problem at hand, and closing the file “properly” would make the sample code more complicated. (For the sake of just a short sample program, I’d be tempted to remove it entirely.)

The result showed the number of bytes being read as the command prompt did when I ran “dir c:WindowsSystem32Calc.exe” – so again, everything looks like it’s okay.

Desperate times call for stupid measures

Just on a whim, I decided to copy Calc.exe to a local folder (the current directory) and retry. After all, accessing a file in a system folder might have some odd notions applied to it. It’s hard to work out what, but… there’s nothing to lose just by trying a simple test. If it can rule anything out, and you’ve got no better ideas, you might as well try even the daftest idea.

I modified the source code to use the freshly-copied file, and it gave the same result. Hmm.

I then reran md5sums on the copied file… and it gave the same result as Java. In other words, running “md5sums c:WindowsSystem32Calc.exe” gave one result, but “md5sums CopyOfCalc.exe” gave a different result. At this point we’ve moved from Java looking like it’s behaving weirdly to md5sums looking suspicious.

Proving the root cause

At this point we’re sort of done – we’ve basically proved that the Java code produces the right hash for whatever data it’s given, but we’re left with the question of what’s happening on the file system. I had a hunch that it might be something to do with x86 vs x64 code (all of this was running on a 64-bit version of Windows 7) – so how do we test that assumption?

I don’t know if there’s a simple way of running an x86 version of the JVM, but I do know how to switch .NET code between x86 and x64 – you can do that for an assembly at build time. C# also makes the hashing and hex conversion simple, so I was able to knock up a very small app to show the problem:

using System;
using System.IO;
using System.Security.Cryptography;

class Test
{
    static void Main()
    {
        using (var md5 = MD5.Create())
        {
            string path = “c:/Windows/System32/Calc.exe”;
            var bytes = md5.ComputeHash(File.ReadAllBytes(path));
            Console.WriteLine(BitConverter.ToString(bytes));
        }
    }
}

(For a larger file I’d have used File.OpenRead instead, but then I’d have wanted to close the stream afterwards. Somehow it wasn’t worth correcting the existing possible handle leak in the Java code, but I didn’t want to write leaky code myself. So instead I’ve got code which reads the whole file into memory unnecessarily… )

You can choose the architecture to build against (usually AnyCPU, x86 or x64 – though it’s interesting to see that “arm” is also an option under .NET 4.5, for obvious reasons) either from Visual Studio or using the “/platform” command line option. This doesn’t change the IL emitted (as far as I’m aware) but it’s used for interop with native code – and in the case of executables, it also determines the version of the CLR which is used.

Building and running in x86 mode gave the same answer as the original “assumed to be good” tools; building and running in x64 mode gave the same answer as Java.

Explaining the root cause

At this point we’ve proved that the file system gives different results depending on whether you access it from a 64-bit process or a 32-bit process. The final piece of the puzzle was to find some something to explain why that happens. With all the evidence about what’s happening, it was now easy to search for more information, and I found this article giving satisfactory details. Basically, there are two different copies of the system executables on a 64 bit system: x86 ones which run under the 32-bit emulator, and x64 ones. They’re actually in different directories, but when a process opens a file in WindowsSystem32, the copy which matches the architecture of the process is used. It’s almost as if the WindowsSystem32 directory is a symlink which changes target depending on the current process.

A Stack Overflow comment on my answer gave a final nugget that this is called the “File System Redirector”.

Conclusion

Debugging sessions often feel a bit like this – particularly if you’re like me, and only resort to real debugging after unit testing has failed. It’s a matter of looking under all kinds of rocks, trying anything and everything, but keeping track of everything you try. At the end of process you should be able to explain every result you’ve seen, in an ideal world. (You may not be able to give evidence of the actual chain of events, but you should be able to construct a plausible chain of events which concurs with your theory.)

Be aware of areas which can often lead to problems, and check those first, gradually reducing the scope of “stuff you don’t understand” to a manageable level, until it disappears completely.