Category Archives: General

Tracking down a performance hit

I’ve been following the progress of .NET Core with a lot of interest, and trying to make the Noda Time master branch keep up with it. The aim is that when Noda Time 2.0 eventually ships (apologies for the delays…) it will be compatible with .NET Core from the start. (I’d expected to be able to support netstandard1.0, but that appears to have too much missing from it. It looks like netstandard1.3 will be the actual target.)

I’ve been particularly looking forward to being able to run the Noda Time benchmarks (now using BenchmarkDotNet) to compare .NET Core on Linux with the same code on Windows. In order to make that a fair comparison, I now have two Intel NUCs, both sporting an i5-5250U and 8GB of memory.

As it happens, I haven’t got as far as running the benchmarks under .NET Core – but I am now able to run all the unit tests on both Linux and Windows, using both the net451 TFM and netcoreapp1.0.

When I did that recently, I was pretty shocked to see that (depending on which tests I ran) the tests were 6-10 times slower on Linux than on Windows, using netcoreapp1.0 in both cases. This post is a brief log of what I did to track down the problem.

Step 1: Check that there’s really a problem

Thought: Is this actually just a matter of not running the tests in a release configuration, or something similar?

Verification: I ran the tests several times, specifying -c Release on the command line to use the release build of both NodaTime.Test.dll and NodaTime.dll. Running under a debugger definitely wasn’t an issue, as this was all just done from the shell.

Additionally, I ran the tests in two ways – firstly, running the whole test suite, and secondly running with --where=cat!=Slow to avoid the few tests I’ve got which are known to be really pretty slow. They’re typically tests which compare the answers the BCL gives with the answers Noda Time gives, across the whole of history for a particular calendar system or time zone. I’m pleased to report that the bottleneck in these tests is almost always the BCL, but that doesn’t help to speed them up. If only the “slow” tests had been much slower on Linux, that might have pointed to the problems being in BCL calendar or time zone code.

The ratios vary, but there was enough of a problem under both circumstances for it to be worth looking further.

Step 2: Find a problematic test

I didn’t have very strong expectations one way or another about whether this would come down to some general problem in the JIT on Linux, or whether there might be one piece of code causing problems in some tests but not others. Knowing that there are significant differences in handling of some culture and time zone code between the Linux and Windows implementations, I wanted to find a test which used the BCL as little as possible – but which was also slow enough for the differences in timing to be pronounced and not easily explicable by the problems of measuring small amounts of time.

Fortunately, NUnit produces a TestResult.xml file which is easy to parse with LINQ to XML, so I could easily transform the results from Windows and Linux into a list of tests, ordered by duration (descending), and spot the right kind of test.

I found my answer in UmAlQuraYearMonthDayCalculatorTest.GetYearMonthDay_DaysSinceEpoch, which effectively tests the Um Al Qura calendar for self consistency, by iterating over every day in the supported time period and checking that we can convert from “days since Unix epoch” to an expected “year, month day”. In particular, this test doesn’t rely on the Windows implementation of the calendar, nor does it use any time zones, cultures or anything similar. It’s nicely self-contained.

This test took 2051ms on Linux and 295ms on Windows. It’s possible that those figures were from a debug build, but I repeated the tests using a release build and confirmed that the difference was still similar.

Step 3: Find the bottleneck

At this point, my aim was to try to remove bits of the test at a time, until the difference went away. I expected to find something quite obscure causing the difference – something like different CPU cache behaviour. I knew that the next step would be to isolate the problem to a small piece of code, but I expected that it would involve a reasonable chunk of Noda Time – at least a few types.

I was really lucky here – the first and most obvious call to remove made a big difference: the equality assertion. Assertions are usually the first thing to remove in tests, because everything else typically builds something that you use in the assertions… if you’re making a call without either using the result later or asserting something about the result, presumably you’re only interested in side effects.

As soon as I removed the call to Assert.AreEqual(expected, actual), the execution time dropped massively on Linux, but hardly moved on Windows: they were effectively on a par.

I wondered whether the problem was with the fact that I was asserting equality between custom structs, and so tried replacing the real assertions with assertions of equality of strings, then of integers. No significant difference – they all showed the same discrepancy between Windows and Linux.

Step 4: Remove Noda Time

Once I’d identified the assertions as the cause of the problem, it was trivial to start a new test project with no dependency on Noda Time, consisting of a test like this:

[Test]
public void Foo()
{
    for (int i = 0; i < 1000000; i++)
    {
        var x = 10;
        var y = 10;
        Assert.AreEqual(x, y);
    }
}

This still demonstrated the problem consistently, and allowed simpler experimentation with different assertions.

Step 5: Dig into NUnit

For once in my life, I was glad that a lot of implementation details of a framework were exposed publicly. I was able to try lots of different “bits” of asserting equality, in order to pin down the problem. Things I tried:

  • Assert.AreEqual(x, y): slow
  • Assert.That(x, Is.EqualTo(y)): slow
  • Constructing an NUnitEqualityComparer: fast
  • Calling NUnitEqualityComparer.AreEqual: fast. (Here the construction occurred before the loop, and the comparisons were in the loop.)
  • Calling Is.EqualTo(y): slow

The last bullets two bullets were surprising. I’d been tipped off that NUnitEqualityComparer uses reflection, which could easily differ in performance between Windows and Linux… but checking for equality seemed to be fast, and just constructing the constraint was slow. In poking around the NUnit source code (thank goodness for Open Source!) it’s obvious why Assert.AreEqual(x, y) and Assert.That(y, Is.EqualTo(x)) behave the same way – the former just calls the latter.

So, why is Is.EqualTo(y) slow (on Linux)? The method itself is simple – it just creates an instance of EqualConstraint. The EqualConstraint constructor body doesn’t do much… so I proved that it’s not EqualConstraint causing the problem by deriving my own constraint with a no-op implementation of ApplyTo… sure enough, just constructing that is slow.

That leaves the constructor of the Constraint abstract base class:

protected Constraint(params object[] args)
{
    Arguments = args;

    DisplayName = this.GetType().Name;
    if (DisplayName.EndsWith("`1") || DisplayName.EndsWith("`2"))
        DisplayName = DisplayName.Substring(0, DisplayName.Length - 2);
    if (DisplayName.EndsWith("Constraint"))
            DisplayName = DisplayName.Substring(0, DisplayName.Length - 10);
}

That looks innocuous enough… but maybe calling GetType().Name is expensive on Linux. So test that… nope, it’s fast.

At this point I’m beginning to wonder whether we’ll ever get to the bottom of it, but let’s just try…

[Test]
public void EndsWith()
{
    string text = "abcdefg";
    for (int i = 0; i < Iterations; i++)
    {
        text.EndsWith("123");
    }
}

… and sure enough, it’s fast on Windows and slow on Linux. Wow. Looks like we have a culprit.

Step 6: Remove NUnit

At this point, it’s relatively plain sailing. We can reproduce the issue in a simple console app. I won’t list the code here, but it’s in the GitHub issue. It just times calling EndsWith once (to get it JIT compiled) and then a million times. Is it the most rigorous benchmark in the world? Absolutely not… but when the difference is between 5.3s on Linux and 0.16s on Windows, on the same hardware, I’m not worried about inaccuracy of a few milliseconds here or there.

Step 7: File a CoreCLR issue

So, as I’ve shown, I filed a bug on GitHub. I’d like to think it was a pretty good bug report:

  • Details of the environment
  • Short but complete console app ready to copy/paste/compile/run
  • Results

Exactly the kind of thing I’d have put into a Stack Overflow question – when I ask for a minimal, complete example on Stack Overflow, this is what I mean.

Anyway, about 20 minutes later (!!!), Stephen Toub has basically worked out the nub of it: it’s a culture issue. Initially, he couldn’t reproduce it – he saw the same results on Windows and Linux. But changing his culture to en-GB, he saw what I was seeing. I then confirmed the opposite – when I ran the code having set LANG=en-US, the problem went away for me. Stephen pulled Matt Ellis in, who gave more details as to what was going wrong behind the scenes.

Step 8: File an NUnit issue

Matt Ellis suggested filing an issue against NUnit, as there’s no reason this code should be culture-sensitive. By specifying the string comparison as Ordinal, we can go through an even faster path than using the US culture. So

if (DisplayName.EndsWith("Constraint"))

becomes

if (DisplayName.EndsWith("Constraint", StringComparison.Ordinal))

… and the equivalent for the other two calls.

I pointed out in the issue that it was also a little bit odd that this was being worked out in every Constraint constructor call, when of course it’s going to give the same result for every instance of the same type. When “every Constraint constructor call” becomes “every assertion in an entire test run”, it’s a pretty performance-critical piece of code. While unit tests aren’t important in terms of performance in the same way that production code is, anything which adds friction is bad news.

Hopefully the NUnit team will apply the simple improvement for the next release, and then the CoreCLR team can attack the tougher underlying problem over time.

Step 9: Blog about it

Open up Stack Edit, start typing: “I’ve been following the progress”… :)

Conclusion

None of the steps I’ve listed here is particularly tricky. Diagnosing problems is often more a matter of determination and being unwilling to admit defeat than cleverness. (I’m not denying that there’s a certain art to being able to find the right seam to split the problem in two, admittedly.)

I hope this has been useful as a “start to finish” example of what a diagnostic session can look and feel like. It wasn’t one physical session, of course – I found bits of time to investigate it over the course of a day or so – but it would have been the same steps either way.

Smug, satisfied smile…

Common mistakes in date/time formatting and parsing

There are many, many questions on Stack Overflow about both parsing and formatting date/time values. (I use the term “date/time” to mean pretty much “any type of chronlogical information” – dates, times of day, instants in time etc.) Given how often the same kinds of mistakes are made, I thought it would be handy to have a blog post to refer to.

This post assumes you already know the basic operations of formatting and parsing, in terms of the appropriate types to use:

Pattern woes

There are three broad classes of issue here – one of which is “just” a matter of carelessness, usually, and the other which still surprises me in terms of sheer wrongness.

Pattern capitalization issues

This is an insidious problem, because in some cases you may get the right values, but not all of the time. I suspect it usually comes up again due to copy and paste, but often from specifications rather than other code – in a specification, it’s pretty clear what "YYYY-MM-DD HH:MM:SS" means as a date/time format, but that doesn’t mean it’s the right pattern to put in code.

The main thing to do is read the documentation carefully. Of course, some platforms have clearer documentation than others, but most are at least “good enough”. For the Java APIs, the pattern specifiers are generally documented with the formatting classes themselves; for .NET’s built-in classes you want the custom date and time format strings and standard date and time format strings MSDN pages, and for Noda Time follow the various options from the text handling part of the user guide. (For other platforms, use your common sense. :)

The most common mistakes here are:

  • Using mm for months or MM for minutes, rather than vice versa. I’ve seen this mistake both ways round.
  • Using hh for “hour of day” when HH is intended. H is in the range 0-23; h is in the range 1-12. h is usually used singly (rather than requiring exactly two digits), and almost always in conjunction with an AM/PM specifier – as otherwise it’s ambiguous. H is usually used as HH, so that 5am is represented as “05” for example.
  • Using YYYY for year – in Java and Noda Time, Y is used for week-year rather than normal calendar year; it’s usually used in conjunction with “week of year” and “day of week”, but it’s much less common than yyyy.
  • Using DD for “day of month” when in Java it actually means “day of year”.

Broad pattern incompatibilities

I’m surprised by how often I see code like this:

var text = "Tue, 5 May 2015 3:15pm";
var dateTime = DateTime.ParseExact(
    text,
    "yyyy-MM-dd'T'HH:mm:ss");

Here the pattern and the actual data are entirely different, and I get the impression that the author has copied the pattern from another piece of code without any thought about what the magic string "yyyy-MM-dd'T'HH:mm:ss" is there for.

I suspect it goes without saying for most readers, but you should never copy code from elsewhere into your own code without understanding how it works, or which parts you may potentially need to modify.

The result of this sort of error is usually a complete failure to parse, which is at least simpler to find than the “plausible but not quite correct” pattern issue.

Pattern incompatibility issues

Some developers assume that a pattern which works in Java will work in Python, or the equivalent for any other pair of platforms. Don’t make this assumption. Always read the documentation – and if you’re porting code from one platform to another, you’ll need to “decode” the pattern with one set of documentation, then “encode” it with the other.

Time zone issues

Understanding time zones

There are two common issues when understanding what a time zone is to start with.

The first is to assume that a UTC offset (e.g. “+8 hours”) is the same as a time zone. This is an understandable mistake, given that a lot of documentation (from organizations which really should know better) misuse the terminology. The UTC offset is the difference between UTC and local time at a particular instant – so for example, while I’m writing this, I’m in the UK time zone which is currently at UTC+1. However, in the winter (in the same time zone) it will be at UTC+0. So if you have a value of (say) “2015-05-10T16:43:00+0100” that only tells you the UTC offset – it doesn’t tell you the time zone. There may well be multiple time zones with the same offset at that particular time, but which will have different offsets at differ times.

The second mistake is to think that an abbreviation such as “EST” or “GMT” identifies a time zone. It doesn’t, in two ways:

  • A single time zone often uses multiple abbreviations over time. For example, “Pacific Time” varies between PST (Pacific Standard Time) and PDT (Pacific Daylight Time). It’s unfortunate that some people use the abbreviation for standard time even when they mean the general time zone – so even though currently (at the time of writing) Pacific Time is in PDT (UTC-7), some people would write the local time with “PST” at the end. Grr. Avoid abbrevations if you possibly can.
  • The same abbreviation may be used in multiple time zones, or even at different points in time to mean different things within the same time zone. For example, “BST” can mean British Summer Time in Europe/London (standard time of UTC+0, plus 1 hour of daylight saving time), British Standard Time in Europe/London (standard time of UTC+1, with no daylight saving time, around 1970 only) and Bougainville Standard Time in Pacific/Bougainville (UTC+11). Avoid abbreviations if you possibly can.

Using time zones in text formatting/parsing

First, you need to understand exactly what the library you’re using does with time zones, and what the types you’re using represent. One of the most common misconceptions here is with java.util.Date – this is just an instant in time, with no concept of a time zone or calendar system. The fact that the string returned from Date.toString always uses the system default time zone is unfortunately misleading in this respect, and causes developers to ask how to “convert” a Date from one time zone to another.

Next, you need to understand exactly what your data represents. In my experience, most textual data either specifies a date and/or time without a given time zone or it specifies a date and time with a UTC offset. When no time zone information is present, you may know the time zone it’s meant to refer to, or you may not. If you’re using a library which has multiple different types to represent different kinds of information (e.g. Joda Time, java.time or Noda Time) I personally find it clearest to parse to a type that closest represents the information actually stated in the text, and then convert it to something else where appropriate.

You definitely need to be aware when the parsing operation is going to impose any sort of time zone understanding on your data. This is the case with SimpleDateFormat in Java and with DateTime.ParseExact and friends in .NET. For SimpleDateFormat, unless you explicitly set a time zone (or the pattern includes a UTC offset), the system default time zone is used – this is usually not what you want. Parsing in .NET allows you to specify how you want the text to be understood, but you need to be careful. (The fact that DateTime sometimes represents a value in the system default time zone, sometimes a value in UTC, and sometimes a value with no associated time zone makes this all tricky.)

Locale / culture issues

Most libraries allow you to specify which culture to use when parsing (or formatting) data. This is a two-edged sword:

  • If you’re formatting a value to be displayed directly to an end user, that’s great: they can see the month name in their own language, etc. In this situation, you’ll typically use a “standard” format (e.g. “the short date/time format”)
  • If you’re formatting or parsing a value which is designed to be machine-readable (e.g. passed to a web service) then you almost certainly want the invariant culture instead of a user-specific culture. In this situation, you’ll typically use a “custom” format (e.g. “yyyy-MM-dd’T’HH:mm:ss”) or a specific culture-invariant format.

Culture can affect several aspects of handling conversions:

  • The calendar system used (e.g. the Gregorian calendar vs an Islamic calendar)
  • The “standard” formats used (e.g. month/day/year vs day/month/year)
  • The separators used (e.g. - vs / for date separators)
  • The month and day names used
  • The number system used

Converting unnecessarily

As a final common problem, you may be performing more conversions than you should be. For example, if you’ve got a DateTime field in the database but you’re passing a value as a string in your SQL parameter (you are using parameterized SQL, right?) then you probably shouldn’t be. Most platforms allow parameters to be specified as the value in a “native” representation. Likewise when you fetch a value, don’t just call toString on it and then parse the result – if the value is a date/time value, it should already be in a native representation; a simple cast (or call to the type-specific method) should be enough.

Conclusion

Date/time text handling is fraught with problems, as a simple look at Stack Overflow shows. Be careful, make sure you know exactly what you’re converting from and to, and check exactly what you’re specifying vs what you’re leaving implicit.

Backward compatibility pain

I’ve been getting a bit cross about backward compatibility recently. This post contains two examples of backward incompatibilities in .NET 4.6, and one example of broken code which isn’t being fixed due for backward compatibility reasons.

Let me start off by saying this post is not meant to be seen as an attack on Microsoft. I suspect that some people will read that first sentence as “This post is an attack on Microsoft, but I don’t want to say so.” It really isn’t. With my naturally positive disposition, I’m going to assume that the people behind the code and decisions in the examples below are smart people who have more information than I do. Their decisions may prove annoying to me personally, but that doesn’t mean those decisions are bad ones for the world at large.

The purpose of this post is partly just because I think readers will find it interesting, and partly to show how there are different things to consider when it comes to backward compatibility in APIs.

Example 1: Enumerable.OrderByDescending is broken

OrderByDescending is broken when three facts are combined unfortunately:

  • IComparer.Compare is allowed to return any integer value, including int.MinValue. The return value is effectively meant to represent one of three results:
    • the first argument is “earlier than” the second (return a negative integer)
    • the two arguments are equal in terms of this comparison (return 0)
    • the first argument is “later than” the second (return a positive integer)
  • -int.MinValue (the unary negation of int.MinValue) is still int.MinValue, because the “natural” result would be outside the range of int. (Think about sbyte as being in the range -128 to 127 inclusive… what would -(-128) in sbyte arithmetic return?)
  • OrderByDescending uses unary negation to attempt to reverse the order returned by an “ascending” comparer.

I’ve blogged about this before, but for the sake of completeness, here’s an example showing that it’s broken. We use a custom comparer which delegates to a normal string comparer – but only ever returns int.MinValue, 0 or int.MaxValue. Just to reiterate, this is an entirely legitimate comparer.

using System;
using System.Collections.Generic;
using System.Linq;

class OrderByDescendingBug
{
    static void Main()
    {
        var comparer = new MaximalComparer<string>(Comparer<string>.Default);
        string[] input = { "apples", "carrots", "dougnuts", "bananas" };

        var sorted = input.OrderByDescending(x => x, comparer);
        Console.WriteLine(string.Join(", ", sorted));
    }
}

class MaximalComparer<T> : IComparer<T>
{
    private readonly IComparer<T> original;

    public MaximalComparer(IComparer<T> original)
    {
        this.original = original;
    }

    public int Compare(T first, T second)
    {
        int originalResult = original.Compare(first, second);
        return originalResult == 0 ? 0
            : originalResult < 0 ? int.MinValue
            : int.MaxValue;
    }
}

We would like the result of this program to be “doughnuts, carrots, bananas, apples” but on my machine (using .NET 4.6 from VS2015 CTP6) it’s “carrots, dougnuts, apples, bananas”.

Naturally, when I first discovered this bug, I filed it in Connect. Unfortunately, the bug has been marked as closed. This comment was logged in 2011:

Swapping arguments in the call to comparer.Compare as you point out would be the most straightforward and general way to support this. However, while well-behaved implementations of comparer.Compare should handle this fine, there may be some implementations out there with subtle bugs that are not expecting us to reverse the order in which we supply these arguments for a given list. Given our focus on runtime compatibility for the next release, we won’t be able to fix this bug in the next version of Visual Studio, though we’ll definitely keep this in mind for a future release!

Fast, backward compatible, correct – pick any two

The clean solution here – reversing the order of the arguments – isn’t the only way of correcting it. We could use:

return -Math.Sign(original.Compare(x, y));

This still uses unary negation, but now it’s okay, because Math.Sign will only return -1, 0 or 1. It’s very slightly more expensive than the clean solution, of course – there’s the call to Math.Sign and the unary negation. Still, at least it works.

What I object to here is the pandering to incorrect code (implementations of IComparer which don’t obey its contract, by making assumptions about the order in which values will be passed) at the expense of correct code (such as the example above; the use of int.MinValue is forced here, but it can crop up naturally too – in a far harder-to-reproduce way, of course). While I can (unfortunately) believe that there are implementations which really are that broken, I don’t think the rest of us should have to suffer for it. I don’t think we should have to suffer at all, but I’d rather suffer a slight piece of inefficiency (the additional Math.Sign call (which may well be JIT-compiled into a single machine instruction – I haven’t checked) than suffer the current correctness issue.

Example 2: TimeZoneInfo becomes smarter in .NET 4.6

A long time ago, Windows time zones had no historical information – they were just a single pair of rules about when the zone started and stopped observing daylight saving time (assuming it did at all).

That improved over time, so that a time zone had a set of adjustment rules, each of which would be in force for a certain portion of history. This made it possible to represent the results of the Energy Policy Act of 2005 for example. These are represented in .NET using TimeZoneInfo.AdjustmentRule, which is slightly under-documented and has suffered from some implementation issues in the past. (There’s also the matter of the data used, but I treat that as a different issue.)

Bugs aside, the properties of TimeZoneInfo and its adjustment rules allowed an interested developer (one wanting to expose the same information in a different form for a better date/time API, as one entirely arbitrary example) to predict the results of the calculations within TimeZoneInfo itself – so the value returned by a call to TimeZoneInfo.GetUtcOffset(DateTime) could be predicted by looking at the standard UTC offset of the time zone, working out which rule was in effect for the specified DateTime, working out if that rule means that DST was being observed at the time, and adjusting the result accordingly.

As of .NET 4.6, it appears this will no longer be the case – not in a straightforward manner. One aspect of inflexibility in TimeZoneInfo is being eliminated: the inability to change standard offset. In the past, if a time zone changed its understanding of “standard time” (as the UK did between 1968 and 1971, for example), that couldn’t be cleanly represented in the TimeZoneInfo data model, leading to some very odd data with “backward” DST offsets to model the situation as nearly as possible.

Now, it seems that each adjustment rule also “knows” the difference between its standard offset and that of the zone as a whole. For the most part, this is a good thing. However, it’s a pain for anyone who works with TimeZoneInfo.AdjustmentRule directly, as the information simply isn’t available on the rule. (This is only a CTP of .NET 4.6, of course – it might become available before the final release.)

Fortunately, one can infer the information by asking the zone itself for the UTC offset of one arbitrary point in the adjustment rule, and then compare that with what you’d predict using just the properties of TimeZoneInfo and AdjustmentRule (taking into account the fact that the start of the rule may already be in DST). So long as the rule performs its internal calculations correctly (and from what I’ve seen, it’s now a lot better than it used to be, though not quite perfect yet) we can predict the result of GetUtcOffset for all other DateTime values.

It’s not clear to me why the information isn’t just exposed with a new property on the rule, however. It’s a change in terms of what’s available, sure – but anyone using the new implementation directly would have to know about the change anyway, as the results of using the exposed data no longer match the results of GetUtcOffset.

Example 3: PersianCalendar and leap years

If you thought the previous two examples were obscure, you’ll love this one.

PersianCalendar is changing in .NET 4.6 to use a more complicated leap year formula. The currently documented formula is:

A leap year is a year that, when divided by 33, has a remainder of 1, 5, 9, 13, 17, 22, 26, or 30.

So new PersianCalendar().IsLeapYear(1) has always returned true – until now. It turns out that Windows 10 is going to support the Persian Calendar (also known as the Solar Hijri calendar) in certain locales, and it’s going to do so “properly” – by which I mean, with a more complex leap year computation. This is what’s known as the “astronomical” Persian calendar and it follows the algorithm described in Calendrical Calculations. The BCL implementation is going to be consistent with that Windows calendar, which makes sense.

It’s worth noting that this calendar has the same leap year pattern as the “simple” one for over 320 years around the modern era (Gregorian 1800 to Gregorian 2123) so it’s really only people dealing with long-past dates in the Persian calendar who will notice the difference. Of course, I noticed because I have a unit test checking that my implementation of the Persian calendar in Noda Time is equivalent to the BCL’s implementation. It was fine until I installed Visual Studio 2015 CTP6…

As it happens, there’s another Persian calendar to consider – the “arithmetic” or “algorithmic” Persian calendar proposed by Ahmad Birashk. This consists of three hierarchical cycles of years (either cycles, subcycles and sub-subcycles or cycles, grand cycles and great grand cycles depending on whether you start with the biggest kind and work in, or start at the smallest and work out).

For Noda Time 2.0, I’m now going to support all three forms: simple (for those who’d like compatibility with the “old” BCL implementation), astronomical and arithmetic.

Conclusion

Backwards compatibility is hard. In all of these cases there are reasons for the brokenness, whether that’s just brokenness against correctness as in the first case, or a change in behaviour as in the other two. I think for localization-related code, there’s probably a greater tolerance of change – or there should be, as localization data changes reasonably frequently.

For the second and third cases, I think it’s reasonable to say that compatibility has been broken in a reasonable cause – particular for the second case, where the time zone data can be much more sensible with the additional flexibility of changing the UTC offset of standard time over history. It’s just a shame there’s fall-out.

The changes I would make if I were the only developer in the world would be:

  • Fix the first case either by ignoring broken comparer implementations, or by taking the hit of calling Math.Sign.
  • Improve the second case by adding a new property to AdjustmentRule and publicising its existence in large, friendly letters.
  • Introduce a new class for the third case instead of modifying the behaviour of the existing class. That would certainly be best for me – but for most users, that would probably introduce more problems than it solved. (I suspect that most users of the Persian calendar won’t go outside the “safe” range where the simple and astronomical calendars are the same anyway.)

One of the joys of working on Noda Time 2.0 at the moment is that it’s a new major version and I am willing to break 1.x code… not gratuitously, of course, but where there’s good reason. Even so, there are painful choices to be made in some cases, where there’s a balance between a slight ongoing smell or a clean break that might cause issues for some users if they’re not careful. I can only imagine what the pain is like when maintaining a large and mature codebase like the BCL – or the Windows API itself.

New blog hosting

As some of you have noticed (and let me know), my old blog hosting provider recently moved off Community Server to WordPress. I figured that as all the links we being broken anyway, now would be a good time to move off msmvps.com anyway. The old posts are still there, but my blog’s new home is codeblog.jonskeet.uk. Hopefully I’ve fixed (almost) all the internal links from one blog post to another, and Nick Craver has generously agreed to fix up links on Stack Overflow, too. I’ll fix up my web site references when I get the chance, and hopefully things will get back to (mostly) normal as soon as possible. Obviously there’ll be plenty of links elsewhere around the web which I can’t fix, but I suspect I’m my own primary consumer, so to speak.

There are still bound to be teething issues, commenting problems, goodness knows what – but hopefully the blog itself will be in a better state than it was before, overall.

Additionally, I’m hoping to gradually (very gradually) coalesce my online presence around the jonskeet.uk domain. I haven’t set that up at all yet, but that’s the plan.

Apologies for link breakage, and fingers crossed it’ll be relatively smooth sailing from here on.

How many 32-bit types might we want?

I was recently directed to an article on "tiny types" – an approach to static typing which introduces distinct types for the sake of code clarity, rather than to add particular behaviour to each type. As I understand it, they’re like type aliases with no conversions between the various types. (Unlike plain aliases, an object is genuinely an instance of the relevant tiny type – it doesn’t have "alias erasure" as a language-based solution could easily do.)

I like the idea, and wish it were better supported in languages – but it led me to thinking more about the existing numeric types that we’ve got and how they’re used. In particular, I was considering how in C# the "byte" type is relatively rarely used as a number, with a useful magnitude which has arithmetic performed on it. That does happen, but more often it’s used either as part of other types (e.g. converting 4 bytes from a stream into an integer) or as a sequence of 8 bits.

It then struck me that the situations where we perform bitwise operations and the situations where we perform arithmetic operations are reasonably distinct. I wonder whether it would be worth having five separate types – which could be purely at the language level, if we wanted:

  • Float32 – regular IEEE-754 32-bit binary floating point type with arithmetic operations but no bitwise operations
  • Int32 – regular signed integer with arithmetic operations but no bitwise operations
  • UInt32 – unsigned integer with arithmetic operations but no bitwise operations
  • Bit32 – a bit sequence with bitwise operations but no arithmetic operations
  • Identity32 – a 32-bit value which only defines equality

The last type would be used for identities which happened to consist of 32 bits but where the actual bit values were only useful in terms of comparison with other identities. (In this case, an Identity64 might be more useful in practice.)

Explicit conversions which preserved the underlying bit pattern would be available, so you could easily generate a sequence of Identity32 values by having a simple Int32 counter, for example.

At the same time, I’d want to introduce bitwise operations for Bit8 and Bit16 values, rather than the "everything is promoted to 32 bits" that we currently have in Java and C#, reducing the number of pointless casts for code that performs bitwise operations.

The expected benefits would be the increased friction when using a type in an "unexpected" way for the value being represented – you’d need to explicitly cast to the right type – but maintaining a low-friction path when using a value in the expected way.

I haven’t yet figured out what I’d want a Stream.Read(…) call to use. Probably Bit32, but I’m not sure yet.

Anyway, I figured I’d just throw it out there as a thought experiment… any reactions?

Diagnosing issues with reversible data transformations

I see a lot of problems which look somewhat different at first glance, but all have the same cause:

  • Text is losing “special characters” when I transfer it from one computer to another
  • Decryption ends up with garbage
  • Compressed data can’t be decompressed
  • I can transfer text but not binary data

These are all cases of transforming and (usually) transferring data, and then performing the reverse transformation. Often there are multiple transformations involved, and they need to be carefully reversed in the appropriate order. For example:

  1. Convert text to binary using UTF-8
  2. Compress
  3. Encrypt
  4. Base64-encode
  5. Transfer (e.g. as text in XML)
  6. Base64-decode
  7. Decrypt
  8. Decompress
  9. Convert binary back to text using UTF-8

The actual details of each question can be different, but the way I’d diagnose them is the same in each case. That’s what this post is about – partly so that I can just link to it when such questions arise. Although I’ve numbered the broad steps, it’s one of those constant iteration situations – you may well need to tweak the logging before you can usefully reduce the problem, and so on.

1. Reduce the problem as far as possible

This is just my normal advice for almost any problem, but it’s particularly relevant in this kind of question.

  • Start by assembling a complete program demonstrating nothing but the transformations. Using a single program which goes in both directions is simpler than producing two programs, one in each direction.
  • Remove pairs of transformations (e.g. encrypt/decrypt) at a time, until you’ve got the minimal set which demonstrates the problem
  • Avoid file IO if possible: hard-code short sample data which demonstrates the problem, and use in-memory streams (ByteArrayInputStream/ByteArrayOutputStream in Java; MemoryStream in .NET) for temporary results
  • If you’re performing encryption, hard-code a dummy key or generate it as part of the program.
  • Remove irrelevant 3rd party dependencies if possible (it’s simpler to reproduce an issue if I don’t need to download other libraries first)
  • Include enough logging (just console output, usually) to make it obvious where the discrepancy lies

In my experience, this is often enough to help you fix the problem for yourself – but if you don’t, you’ll be in a much better position for others to help you.

2. Make sure you’re really diagnosing the right data

It’s quite easy to get confused when comparing expected and actual (or before and after) data… particularly strings:

  • Character encoding issues can sometimes be hidden by spurious control characters being introduced invisibly
  • Fonts that can’t display all characters can make it hard to see the real data
  • Debuggers sometimes “helpfully” escape data for you
  • Variable-width fonts can make whitespace differences hard to spot

For diagnostic purposes, I find it useful to be able to log the raw UTF-16 code units which make up a string in both .NET and Java. For example, in .NET:

static void LogUtf16(string input)
{
    // Replace Console.WriteLine with your logging approach
    Console.WriteLine(“Length: {0}”, input.Length);
    foreach (char c in input)
    {
        Console.WriteLine(“U+{0:x4}: {1}”, (uint) c, c);
    }
}

Binary data has different issues, mostly in terms of displaying it in some form to start with. Our diagnosis tools are primarily textual, so you’ll need to perform some kind of conversion to a string representation in order to see it at all. If you’re really trying to diagnose this as binary data (so you’re interested in the raw bytes) do not treat it as encoded text using UTF-8 or something similar. Hex is probably the simplest representation that allows differences to be pinpointed pretty simply. Again, logging the length of the data in question is a good first step.

In both cases you might want to include a hash in your diagnostics. It doesn’t need to be a cryptographically secure hash in any way, shape or form. Any form of hash that is likely to change if the data changes is a good start. Just make sure you can trust your hashing code! (Every piece of code you write should be considered suspect – including whatever you decide to use for converting binary to hex, for example. Use trusted third parties and APIs provided with your target platform where possible. Even though adding an extra dependency for diagnostics makes it slightly harder for others to reproduce the problem, it’s better than the diagnostics themselves being suspect.)

3. Analyze a clean, full, end-to-end log

This can be tricky when you’ve got multiple systems and platforms (which is why if you can possibly reproduce it in a single program it makes life simpler) but it’s really important to look at one log for a complete run.

Make sure you’re talking about the same data end-to-end. If you’re analyzing live traffic (which should be rare; unless the problem is very intermittent, this should all be done in test environments or a developer machine) or you have a shared test environment you need to be careful that you don’t use part of the data from one test and part of the data from another test. I know this sounds trivial, but it’s a really easy mistake to make. In particular, don’t assume that the data you’ll get from one part of the process will be the same run-to-run. In many cases it should be, but if the overall system isn’t working, then you already know that one of your expectations is invalid.

Compare “supposed to be equal” parts of the data. As per the steps in the introduction, there should be pairs of equal data, moving from the “top and bottom” of the transformation chain towards the middle. Initially, you shouldn’t care about whether you view the transformation as being correct – you’re only worried about whether the output is equal to the input. If you’ve managed to preserve all the data, the function of the transformation (encryption, compression etc) becomes relevant – but if you’re losing data, anything else is secondary. This is where the hash from the bottom of step 2 is relevant: you want to be able to determine whether the data is probably right as quickly as possible. Between “length” and “hash”, you should have at least some confidence, which will let you get to the most likely problem as quickly as possible.

4. Profit! (Conclusion…)

Once you’ve compared the results at each step, you should get an idea of which transformations are working and which aren’t. This may allow you to reduce the problem further, until you’ve just got a single transformation to diagnose. At that point, the problem becomes about encryption, or about compression, or about text encoding.

Depending on the situation you’re in, at this point you may be able to try multiple implementations or potentially multiple platforms to work out what’s wrong: for example, if you’re producing a zip file and then trying to decompress it, you might want to try using a regular decompression program to open your intermediate results, or decompress the results of compressing with a standard compression tool. Or if you’re trying to encrypt on Java and decrypt in C#, implement the other parts in each platform, so you can at least try to get a working “in-platform” solution – that may well be enough to find out which half has the problem.

To some extent all this blog post is about is reducing the problem as far as possible, with some ideas of how to do that. I haven’t tried to warn you much about the problems you can run into in any particular domain, but you should definitely read Marc Gravell’s excellent post on “How many ways can you mess up IO?” and my earlier post on understanding the meaning of your data is pretty relevant too.

As this is a “hints and tips” sort of post, I’ll happily modify it to include reader contributions from comments. With any luck it’ll be a useful resource for multiple Stack Overflow questions in the months and years to come…

Career and skills advice

It feels a little odd even to write this post, but I receive quite a few emails asking me for advice on how to get better at programming, how to get through interviews, whether it’s better to be a generalist or a specialist etc. I want to make it very clear right from the start, I am not a career guidance expert. I have very little evidence that this is good advice, and it may well not be good advice for you even if it’s good advice in general. Oh, and don’t expect anything shockingly insightful or original, either. You may well have heard much or all of this advice before.

So, with those caveats out of the way, my random thoughts…

Communication, communication, communication

Software is all about communication, whether you’re writing a design document, answering questions on Stack Overflow, giving a talk at a user conference, being grilled at an interview, or simply writing code. In fact, writing code is about communicating with two different audiences at the same time: the computer (compiler, interpreter, whatever) and whoever is going to read and maintain the code in the future.

In fact, I see improved communication as one of the primary benefits of Stack Overflow. Every time you ask or answer a question, that’s an opportunity to practise communicating with other developers. Is your post as clear as it can be? Does it give just the right amount of information – nothing extraneous, but everything that’s relevant – in a coherent way? Does it strike the right tone with the reader, suppressing any frustration you may currently be feeling either at the problem you’re facing or the perceived flaws in someone you’re replying to?

Stack Overflow is just one way you can improve your communication skills, of course. There are many others:

  • Write a blog, even if you think no-one will read it. You may be surprised. If you write about something you find interesting – and put time into writing about it well – it’s very likely that someone else will find it interesting too. If you enjoy this enough, consider writing a book – there are plenty of options for publication these days, from traditional publishers to online-only approaches.
  • Find opportunities to give presentations, whether that’s at user groups, conferences, or at work. I do realise that not everyone likes presenting to large groups of people, even if I find that frame of mind hard to empathise with. (Who wouldn’t want to be the centre of attention? Yes, I really am that shallow.)
  • Take pride in your communication within code. Spend a little bit longer on documentation and comments than you might do normally, and really try to think about what someone reading it might be trying to discover. (Also think about what they ought to know about even if they weren’t aware that they needed to know it.)
  • Read a lot, and reflect on it. While we’re used to commenting on the quality of fiction and perhaps blog posts, we don’t tend to consciously consider the quality of other written work. Next time you’re reading the documentation of some library you’re using, take a moment to ask yourself how effective it is and what contributes to the quality (or lack thereof).

You may well find that improving your communication also improves your thought processes. Clarity of ideas and clarity of expression often go hand in hand.

Coding and reflecting

Write code and then look back on it – ideally after significantly different periods of time. Code that appears "cool" today can feel immature or self-serving six months later.

Of course there are different kinds of code written under different constraints. You may find that if you’re already a professional software engineer, you aren’t given enough time for critical evaluation… but you may have the benefit of code reviews from your peers. (Code reviews can often be regarded as a chore, but if you approach them in the right frame of mind, you can learn a lot.) If you’re contributing to an open source project – or perhaps creating one from scratch – then you may find you have more opportunity to try multiple approaches, and look back at the code you’ve written in the past.

Reading other people’s code can be helpful too, though I find this is one of those activities which is more talked about than actually done.

Specialist or generalist?

Eric Lippert has written about how he was given advice to pick a subject and become a world expert on it, and I think there’s a lot of sense in that. On the other hand, you don’t want to be so constrained to a niche area of knowledge that you can move in the wider world. There’s a balance to be struck here: I’m a "generalist" in terms of having relatively few skills in specific business domains, but I’m a "specialist" in terms of having spent significant time studying C# and Java from a language perspective. Similarly I’m very restricted in terms of which aspects of software I’m actually competent at: I’d like to think I’m reasonable at library or server-side code, but my experience of UI development (whether that’s web or native client) are severely limited. I’m also familiar with fewer languages than I’d like to be.

You’ll need to work out what’s right for you here – but it’s worth doing so consciously rather than simply drifting. Of course, making deliberate decisions isn’t the same thing as executing them: a few months ago I decided it would be fun to explicitly concentrate on application development for a while (WPF, Windows Store, Android and iOS via Xamarin) but so far there have been precious few results.

Sometimes people ask me which technologies are good in terms of employment. That can be a very local question, but my experience is that it doesn’t really matter too much. If you find something you’re passionate about, then whatever you learn from that will often be useful in a more mundane but profitable environment. Technology changes fast enough that you’re almost certainly going to have to learn several languages and platforms over your career, so you might as well make at least some of those choices fun ones.

Interviews

A lot of advice has been written about interviews by people with far more experience than myself. I haven’t actually been an interviewee very often, although I’ve now conducted quite a few interviews. A few general bits of advice though:

  • Don’t exaggerate too much on your CV. If you claim you’re an expert in a language and then forget how to write a method declaration, that leaves a bad impression.
  • Find out what the company is looking for beforehand, and what the interview is likely to consist of. Are you likely to be asked to code on a whiteboard or in a text editor? If so, it wouldn’t be a bad idea to practise that – it feels quite different to writing code in an IDE. If you’re likely to be asked questions about algorithmic complexity but university feels like a long time ago, brush up on that a bit.
  • Be interesting! This often comes down to being passionate about something – anything, really. If the only motivation you can give is "I’m bored with my current job" without saying what you’d find interesting, that won’t go down well.
  • Listen carefully to what the interviewer asks you. If they specifically say to ignore one aspect of coding (validation, performance, whatever it might be) then don’t justify your answer using one of those aspects. By all means mention it, but be driven by the guidance of what the interviewer has specified as important. (The balance may well shift through the interview, as the interviewer tries to evaluate you on multiple criteria.)
  • Communicate as clearly as you can, even while you’re thinking. I love it when a candidate explains their thinking as they’re working through a difficult problem. I can see where their mind leads them, how intuitive or methodical they are, where they may have weaknesses – and I can guide them with hints. It can be worth taking a few seconds to work out the best way of vocalising your thoughts though. Oh, and if you do write on a whiteboard, try to make it legible.

Conclusion

So that’s about it – Jon’s entirely subjective guide to a fun and profitable software engineering career. If you choose to act on some of this advice, I sincerely hope it turns out well for you, but I make no guarantees whatsoever. Best of luck either way!

But what does it all mean?

This year before NDC, I wrote an article for the conference edition of "The Developer" magazine. Follow that link to find the article in all its illustrated glory (along with many other fine articles, of course) – or read on for just the text.

Back when I used to post on newsgroups I would frequently be in the middle of a debate of the details of some behaviour or terminology, when one poster would say: “You’re just quibbling over semantics” as if this excused any and all previous inaccuracies. I would usually agree –  I was indeed quibbling about semantics, but there’s no “just” about it.

Semantics is meaning, and that’s at the heart of communication – so for example, a debate over whether it’s correct to say that Java uses pass- by-reference1 is all about semantics. Without semantics, there’s nothing to talk about.
This has been going on for years, and I’m quite used to being the pedant in any conversation when it comes to terminology – it’s a topic close to my heart. But over the years – and importantly since my attention has migrated to Stack Overflow, which tends to be more about real problems developers are facing than abstract discussions – I’ve noticed that I’m now being picky in the same sort of way, but about the meaning of data instead of terminology.

Data under the microscope

When it comes down to it, all the data we use is just bits – 1s and 0s. We assemble order from the chaos by ascribing meaning to those bits… and not just once, but in a whole hierarchy. For example, take the bits 01001010 00000000:

  • Taken as a little-endian 16-bit unsigned integer, they form a value of 74. • That 16-bit unsigned integer can be viewed as a UTF-16 code unit for the character ‘J’.
  • That character might be the first character within a string.
  • That string might be the target of a reference, which is the value for a field called “firstName”.
  • That field might be within an instance of a class called “Person”.
  • The instance of “Person” whose “firstName” field has a value
    which is a reference to the string whose first character is ‘J’ might itself be the target of a reference, which is the value for a field called “author”, within an instance of a class called “Article”.
  • The instance of “Article” whose “author” field (fill in the rest your- self…) might itself be the target of a reference which is part of a collection, stored (indirectly) via a field called “articles” in a class called “Magazine”.

As we’ve zoomed out from sixteen individual bits, at every level we’ve imposed meaning. Imagine all the individual bits of information which would be involved in a single instance of the Magazine with a dozen articles, an editorial, credits – and perhaps even images. Really imagine them, all written down next to each other, possibly without even the helpful gap between bytes that I included in our example earlier.

That’s the raw data. Everything else is “just” semantics.

So what does that have to do with me?

I’m sure I haven’t told you anything you don’t already know. Yes, we can impose meaning on these puny bits, with our awesome developer power. The trouble is that bits have a habit of rebelling if you try to impose the wrong kind of meaning on them… and we seem to do that quite a lot.

The most common example I see on Stack Overflow is treating text (strings) and binary data (image files, zip files, encrypted data) as if they were interchangeable. If you try to load a JPEG using StreamReader in .NET or FileReader in Java, you’re going to have problems. There are ways you can actually get away with it – usually by using the ISO-8859-1 encoding – but it’s a little bit like trying to drive down a road with a broken steering wheel, only making progress by bouncing off other obstacles.

While this is a common example, it’s far from the only one. Some of the problems which fall into this category might not obviously be due to the mishandling of data, but at a deep level they’re all quite similar:

  • SQL injection attacks due to mingling code (SQL) with data (values) instead of using parameters to keep the two separate.
  • The computer getting arithmetic “wrong” because the developer didn’t understand the meaning of floating binary point numbers, and should actually have used a floating decimal point type (such as System.Decimal or java.math. BigDecimal).
  • String formatting issues due to treating the result of a previous string formatting operation as another format string – despite the fact that now it includes user data which could really have any kind of text in it.
  • Double-encoding or double-unencoding of text data to make it safe for transport via a URL.
  • Almost anything to do with dates and times, including – but certainly not limited to – the way that java.util.Date and System. DateTime values don’t inherently have a format. They’re just values.

The sheer bulk of questions which indicate a lack of understanding of the nature of data is enormous. Of course Stack Overflow only shows a tiny part of this – it doesn’t give much insight into the mountain of code which handles data correctly from the perspective of the types involved, but does entirely inappropriate things with those values from the perspective of the intended business meaning of those values.

It’s not all doom and gloom though. We have some simple but powerful weapons available in the fight against semantic drivel.

Types

This article gives a good indication of why I’m a fan of statically typed languages. The type system can convey huge amounts of information about the nature of data, even if the business meaning of values of those types can be horribly overloaded.

Maybe it would be good if we distinguished between human-readable text which should usually be treated in a culture-sensitive way, and machine-parsable text which should usually be treated without reference to any culture. Those two types might have different operations available on them, for example – but it would almost certainly get messy very quickly.

For business-specific types though, it’s usually easy to make sure that each type is really only used for one concept, and only provides operations which are meaningful for that concept.

Meaningful names

Naming is undoubtedly hard, and I suspect most developers have had the same joyless experiences that I have of struggling for ten minutes to come up with a good class or method name, only to end up with the one we first thought of which we really don’t like… but which is simply better than all the other options we’ve considered. Still, it’s worth the struggle.

Our names don’t have to be perfect, but there’s simply no excuse for names such as “Form1” or “MyClass” which seem almost designed to carry no useful information whatsoever. Often simply the act of naming something can communicate meaning. Don’t be afraid to extract local variables in code just to clarify the meaning of some otherwise-obscure expression.

Documentation

I don’t think I’ve ever met a developer who actually enjoys writing documentation, but I think it’s hard to deny that it’s important. There tend to be very few things that are so precisely communicated just by the names of types, properties and methods that no further information is required. What is guaranteed about the data? How should it be used – and how should it not be used?

The form, style and detail level of documentation will vary from project to project, but don’t underestimate its value. Aside from behavioural  details, ask yourself what meaning you’re imposing on or assuming about the data you’re dealing with… what would happen if someone else made different assumptions? What could go wrong, and how can you prevent it by expressing your own understanding clearly? This isn’t just important for large projects with big teams, either – it’s entirely possible that the person who comes to the code with a different viewpoint is going to be you, six months later.

Conclusion

I apologise if this all sounds like motherhood and apple pie. I’m not saying anything new, after all – of course we all agree that we need to understand the meaning of the our data. I’m really not trying to waste your time though: I want you to take a moment to appreciate just how much it matters that we understand the data we work with, and how many problems can be avoided by putting effort into communicating that effectively and reading what others have written about their data.

There are other approaches we can take beyond those I’ve listed above, too – much more technically exciting ones – around static code analysis, contracts, complicated annotations and the like. I have nothing against them, but just understanding the value of semantics is a crucial starting point, and once everyone on your team agrees on that, you’ll be in a much better position to move forward and agree on the meaning of your data itself. From there, the world’s your oyster – if you see what I mean.


1 It doesn’t; references are passed by value, as even my non-programmer wife knows by now. That’s how often this myth comes up.

New tool to play with: SemanticMerge

A little while ago I was contacted about a new merge tool from the company behind PlasticSCM. (I haven’t used Plastic myself, but I’d heard of it.) My initial reaction was that I wasn’t interested in anything which required me to learn yet another source control system, but SemanticMerge is independent of PlasticSCM.

My interested was piqued when I learned that SemanticMerge is actually built on Roslyn. I don’t generally care much about implementation details, but I’m looking out for uses of Roslyn outside Microsoft, partly to see if I can gain any inspiration for using it myself. Between the name and the implementation detail, it should be fairly obvious that this is a tool which understands changes in C# source code rather better than a plain text diff.

I’ve had SemanticMerge installed on one of my laptops for a month or so now. Unfortunately it’s getting pretty light use – my main coding outside work is on Noda Time, and as I perform the vast majority of the commits, the only time I really need to perform merges is when I’ve forgotten to push a commit from one machine before switching to another. I’ve used it for diffs though, and it seems to be doing the right thing there – showing which members have been added, moved, changed etc.

I don’t believe it can currently support multi-file changes – for example, spotting that a lot of changes are all due to a rename operation – but even if it doesn’t right now, I suspect that may be worked on in the future. And of course, the merge functionality is the main point.

SemanticMerge is now in beta, so pop over to the web site and give it a try.

The Open-Closed Principle, in review

Background

I’ve been to a few talks on SOLID before. Most of the principles seem pretty reasonable to me – but I’ve never "got" the open-closed principle (OCP from here on). At CodeMash this year, I mentioned this to the wonderful Cori Drew, who said that she’d been at a user group talk where she felt it was explained well. She mailed me a link to the user group video, which I finally managed to get round to watching last week. (The OCP part is at around 1 hour 20.)

Unfortunately I still wasn’t satisfied, so I thought I’d try to hit up the relevant literature. Obviously there are umpteen guides to OCP, but I decided to start with Wikipedia, and go from there. I mentioned my continuing disappointment on Twitter, and the conversation got lively. Uncle Bob Martin (one of the two "canonical sources" for OCP) wrote a follow-up blog post, and I decided it would be worth writing one of my own, too, which you’re now reading.

I should say up-front that in some senses this blog post isn’t so much about the details of the open-closed principle, as about the importance of careful choice of terminology at all levels. As we’ll see later, when it comes to the "true" meaning of OCP, I’m pretty much with Uncle Bob: it’s motherhood and apple pie. But I believe that meaning is much more clearly stated in various other principles, and that OCP as the expression of an idea is doing more harm than good.

Reading material

So what is it? (Part 1 – high level)

This is where it gets interesting. You see, there appear to be several different interpretation of the principle – some only subtly distinct, others seemingly almost unrelated. Even without looking anything up, I knew an expanded version of the name:

Modules should be open for extension and closed for modification.

The version quoted in Wikipedia and in Uncle Bob’s paper actually uses "Software entities (classes, modules, functions, etc.)" instead of modules, but I’m not sure that really helps. Now I’m not naïve enough to expect everything in a principle to be clear just from the title, but I do expect some light to be shed. In this case, unfortunately I’m none the wiser. "Open" and "closed" sound permissive and restrictive respectively, but without very concrete ideas about what "extension" and "modification" mean, it’s hard to tell much more.

Fair enough – so we read on to the next level. Unfortunately I don’t have Bertrand Meyer’s "Object-Oriented Software Construction" book (which I take to be the original), but Uncle Bob’s paper is freely available. Wikipedia’s summary of Meyer’s version is:

The idea was that once completed, the implementation of a class could only be modified to correct errors; new or changed features would require that a different class be created. That class could reuse coding from the original class through inheritance. The derived subclass might or might not have the same interface as the original class.

Meyer’s definition advocates implementation inheritance. Implementation can be reused through inheritance but interface specifications need not be. The existing implementation is closed to modifications, and new implementations need not implement the existing interface.

And Uncle Bob’s high level description is:

Modules that conform to the open-closed principle have two primary attributes.

  1. They are "Open For Extension". This means that the behavior of the module can be extended. That we can make the module behave in new and different ways as the requirements of the application change, or to meet the needs of new applications.
  2. They are "Closed for Modification". The source code of such a module is inviolate. No one is allowed to make source code changes to it.

I immediately took a dislike to both of these descriptions. Both of them specifically say that the source code can’t be changed, and the description of Meyer’s approach to "make a change by extending a class" feels like a ghastly abuse of inheritance to me… and goes firmly against my (continued) belief in Josh Bloch’s advice of "design for inheritance or prohibit it" – where in the majority of cases, designing a class for inheritance involves an awful lot of work for little gain. Designing an interface (or pure abstract class) still involves work, but with fewer restrictions and risks.

Craig Larman’s article uses the term "closed" in a much more reasonable way, to my mind:

Also, the phrase "closed with respect to X" means that clients are not affected if X changes.

When I say "more reasonable way" I mean in terms of how I want to write code… not in terms of the use of the word "closed". This is simply not how the word "closed" is used elsewhere in my experience. In the rare cases where "closed" is used with "to", it’s usually in terms of what’s not allowed in: "This bar is closed to under 18s" for example. Indeed, that’s how I read "closed to modification" and that appears to be backed up by the two quotes which say that once a class is complete, the source code cannot be changed.

Likewise the meaning of "open for extension" seems unusual to me. I’d argue that the intuitive meaning is "can be extended" – where the use of the term "extended" certainly nods towards inheritance, even if it’s not intended meaning. If the idea is "we can make the module behave differently" – as Uncle Bob’s description suggests – then "open for extension" is a very odd way of expressing that idea. I’d even argue that in the example given later, it’s not the "open" module that behaves differently – it’s the combination of the module and its collaborators, acting as a unified program, which behaves differently after some aspects are modified.

So what is it? (Part 2 – more detail)

Reading on through the rest of Uncle Bob’s paper, the ideas become much more familiar. There’s a reasonable example of a function which is asked to draw a collection of shapes: the bad code is aware of all the types of shape available, and handles each one separately. The good code uses an abstraction where each shape (Circle, Square) knows how to draw itself and inherits from a common base class (Shape). Great stuff… but what’s that got to do with what was described above? How are the concepts of "open" and "closed" clarified?

The answer is that they’re not. The word "open" doesn’t occur anywhere in the rest of the text, other than as part of the term "open-closed principle" or as a label for "open client". While it’s perhaps rather easier to see this in hindsight, I suspect that any time a section which is meant to clarify a concept doesn’t use some of the key words used to describe the concept in a nutshell, that description should be treated as suspect.

The word "closed" appears more often, but only in terms of "closed against" which is never actually defined. (Is "closed against" the same as "closed for"?) Without Craig Larman’s explanation, sentences like this make little sense to me:

The function DrawAllShapes does not conform to the open-closed principle because it cannot be closed against new kinds of shapes.

Even Craig’s explanation feels somewhat at odds with Uncle Bob’s usage, as it talks about clients being affected. This is another of the issues I have with the original two descriptions: they talk about a single module being open/closed, whereas we’re dealing with abstractions where there are naturally at least two pieces of code involved (and usually three). Craig’s description of changes in one module not affecting clients is describing a relationship – which is a far more useful way of approaching things. Even thinking about the shape example, I’m getting increasingly confused about exactly what’s open and what’s closed. It feels to me like it’s neither the concrete shape classes nor the shape-drawing code which is open or closed – it’s the interface between the two; the abstract Shape class. After all, these statements seem reasonable:

  • The Shape class is open for extension: there can be many different concrete subclasses, and code which only depends on the Shape class doesn’t need to know about them in order to use them when they are presented as shapes.
  • The Shape class is closed for modification: no existing functions can be removed (as they may be relied on by existing clients) and no new pure virtual functions can be added (as they will not be implemented by existing subclasses).

It’s still not how I’d choose to express it, but at least it feels like it makes sense in very concrete terms. It doesn’t work well with how Uncle Bob uses the term "closed" though, so I still think I may be on a different page when it comes to that meaning. (Uncle Bob does also make the point that any significant program isn’t going to adhere to the principle completely in every part of the code – but in order to judge where it’s appropriate to be closed, I do really need to understand what being closed means.)

Just to make it crystal clear, other than the use of the word "closed," the low-level description of what’s good and what’s bad, and why, is absolutely fine. I really have no problems with it. As I said at the start, the idea being expressed makes perfect sense. It just doesn’t work (for me) when expressed in the terms used at a higher level.

Protected Variation

By contrast, let’s look at a closely related idea which I hadn’t actually heard about before I started all this research: protected variation. This name was apparently coined by Alistair Cockburn, and Craig Larman either quotes or paraphrases this as:

Identify points of predicted variation and create a stable interface around them.

Now that’s a description I can immediately identify with. Every single word of it makes sense to me, even without reading any more of Craig’s article. (I have read the rest, obviously, and I’d encourage others to do so.) This goes back to Josh Bloch’s "design for inheritance or prohibit it" motto: identifying points of predicted variation is hard, and it’s necessary in order to create a stable interface which is neither too constrictive for implementations nor too woolly for clients. With class inheritance there’s the additional concern of interactions within a class hierarchy when a virtual method is called.

So in Uncle Bob’s Shape example, all there is is a point of predicted variation: how the shape is drawn. PV suggests the converse as well – that as well as points of predicted variation, there may be points which will not vary. That’s inherent in the API to some extent – every shape must be capable of drawing itself with no further information (the Draw method has no parameters) but it could also be extended to non-virtual aspects. For example, we could decide that every shape has one (and only one) colour, which will not change during its lifetime. That can be implemented in the Shape class itself – with no predicted variation, there’s no need of polymorphism.

Of course, the costs of incorrectly predicting variation can be high: if you predict more variation than is actually warranted, you waste effort on over-engineering. If you predict less variation than is required, you usually end up either having to change quite a lot of code (if it’s all under your control) or having to come up with an "extended" interface. There’s the other aspect of shirking responsibility on this predicted variation to some extent, by making some parts "optional" – that’s like saying, "We know implementations will vary here in an incompatible way, but we’re not going to try to deal with it in the API. Good luck!" This usually arises in collection APIs, around mutating operations which may or may not be valid (based on whether the collection is mutable or not).

Not only is PV easy to understand – it’s easy to remember for its comedy value, at least if you’re a fan of The Hitchhiker’s Guide to the Galaxy. Remember Vroomfondel and Majikthise, the philosophers who invaded Cruxwan University just as Deep Thought was about to announce the answer to Life, the Universe, and Everything? Even though they were arguing with programmers, it sounds like they were actually the ones with software engineering experience:

"I’ll tell you what the problem is mate," said Majikthise, "demarcation, that’s the problem!"

[…]

"That’s right!" shouted Vroomfondel, "we demand rigidly defined areas of doubt and uncertainty!"

That sounds like a pretty good alternative description of Protected Variation to me.

Conclusion

So, that’s what I don’t like about OCP. The name, and the broad description – both of which I believe to be unhelpful, and poorly understood. (While I’ve obviously considered the possibility that I’m the only one who finds it confusing, I’ve heard enough variation in the explanations of it to suggest that I’m really not the only one.)

That sounds like a triviality, but I think it’s rather important. I suspect that OCP has been at least mentioned in passing in thousands if not tends of thousands of user groups and conferences. The purpose of such gatherings is largely for communication of ideas – and when a sound idea is poorly expressed, an opportunity is wasted. I suspect that any time Uncle Bob has personally presented it in detail, the idea has sunk in appropriately – possibly after some initial confusion about the terminology. But what about all the misinterpretations and "glancing blows" where OCP is only mentioned as a good thing that clearly everyone wants to adhere to, with no explanation beyond the obscure ones described in part one above? How many times did they shed more confusion than light?

I believe more people are familiar with Uncle Bob’s work on OCP than Bertrand Meyer’s. Further, I suspect that if Bertrand Meyer hadn’t already introduced the name and brief description, Uncle Bob may well have come up with far more descriptive ones himself, and the world would have been a better place. Fortunately, we do have a better name and description for a concept which is at least very closely related. (I’m not going to claim PV and OCP are identical, but close enough for a lot of uses.)

Ultimately, words matter – particularly when it comes to single sentence descriptions which act as soundbytes; shorthand for communicating a complex idea. It’s not about whether the more complex idea can be understood after carefully reading thorough explanations. It’s about whether the shorthand conveys the essence of the idea in a clear way. On that front, I believe the open-closed principle fails – which is why I’d love to see it retired in favour of more accessible ones.

Note for new readers

I suspect this post may end up being read more widely than most of my blog entries. If you’re going to leave a comment, please be aware that the CAPTCHA doesn’t work on Chrome. I’m aware of this, but can’t fix it myself. If you right-click on the broken image and select "open in new tab" you should get a working image. Apologies for the inconvenience.