Backward compatibility pain

March 2, 2015 jonskeet 24 Comments

I’ve been getting a bit cross about backward compatibility recently. This post contains two examples of backward incompatibilities in .NET 4.6, and one example of broken code which isn’t being fixed due for backward compatibility reasons.

Let me start off by saying this post is not meant to be seen as an attack on Microsoft. I suspect that some people will read that first sentence as “This post is an attack on Microsoft, but I don’t want to say so.” It really isn’t. With my naturally positive disposition, I’m going to assume that the people behind the code and decisions in the examples below are smart people who have more information than I do. Their decisions may prove annoying to me personally, but that doesn’t mean those decisions are bad ones for the world at large.

The purpose of this post is partly just because I think readers will find it interesting, and partly to show how there are different things to consider when it comes to backward compatibility in APIs.

Example 1: Enumerable.OrderByDescending is broken

OrderByDescending is broken when three facts are combined unfortunately:

IComparer.Compare is allowed to return any integer value, including int.MinValue. The return value is effectively meant to represent one of three results:
- the first argument is “earlier than” the second (return a negative integer)
- the two arguments are equal in terms of this comparison (return 0)
- the first argument is “later than” the second (return a positive integer)
-int.MinValue (the unary negation of int.MinValue) is still int.MinValue, because the “natural” result would be outside the range of int. (Think about sbyte as being in the range -128 to 127 inclusive… what would -(-128) in sbyte arithmetic return?)
OrderByDescending uses unary negation to attempt to reverse the order returned by an “ascending” comparer.

I’ve blogged about this before, but for the sake of completeness, here’s an example showing that it’s broken. We use a custom comparer which delegates to a normal string comparer – but only ever returns int.MinValue, 0 or int.MaxValue. Just to reiterate, this is an entirely legitimate comparer.

using System;
using System.Collections.Generic;
using System.Linq;

class OrderByDescendingBug
{
    static void Main()
    {
        var comparer = new MaximalComparer<string>(Comparer<string>.Default);
        string[] input = { "apples", "carrots", "dougnuts", "bananas" };

        var sorted = input.OrderByDescending(x => x, comparer);
        Console.WriteLine(string.Join(", ", sorted));
    }
}

class MaximalComparer<T> : IComparer<T>
{
    private readonly IComparer<T> original;

    public MaximalComparer(IComparer<T> original)
    {
        this.original = original;
    }

    public int Compare(T first, T second)
    {
        int originalResult = original.Compare(first, second);
        return originalResult == 0 ? 0
            : originalResult < 0 ? int.MinValue
            : int.MaxValue;
    }
}

We would like the result of this program to be “doughnuts, carrots, bananas, apples” but on my machine (using .NET 4.6 from VS2015 CTP6) it’s “carrots, dougnuts, apples, bananas”.

Naturally, when I first discovered this bug, I filed it in Connect. Unfortunately, the bug has been marked as closed. This comment was logged in 2011:

Swapping arguments in the call to comparer.Compare as you point out would be the most straightforward and general way to support this. However, while well-behaved implementations of comparer.Compare should handle this fine, there may be some implementations out there with subtle bugs that are not expecting us to reverse the order in which we supply these arguments for a given list. Given our focus on runtime compatibility for the next release, we won’t be able to fix this bug in the next version of Visual Studio, though we’ll definitely keep this in mind for a future release!

Fast, backward compatible, correct – pick any two

The clean solution here – reversing the order of the arguments – isn’t the only way of correcting it. We could use:

return -Math.Sign(original.Compare(x, y));

This still uses unary negation, but now it’s okay, because Math.Sign will only return -1, 0 or 1. It’s very slightly more expensive than the clean solution, of course – there’s the call to Math.Sign and the unary negation. Still, at least it works.

What I object to here is the pandering to incorrect code (implementations of IComparer which don’t obey its contract, by making assumptions about the order in which values will be passed) at the expense of correct code (such as the example above; the use of int.MinValue is forced here, but it can crop up naturally too – in a far harder-to-reproduce way, of course). While I can (unfortunately) believe that there are implementations which really are that broken, I don’t think the rest of us should have to suffer for it. I don’t think we should have to suffer at all, but I’d rather suffer a slight piece of inefficiency (the additional Math.Sign call (which may well be JIT-compiled into a single machine instruction – I haven’t checked) than suffer the current correctness issue.

Example 2: `TimeZoneInfo` becomes smarter in .NET 4.6

A long time ago, Windows time zones had no historical information – they were just a single pair of rules about when the zone started and stopped observing daylight saving time (assuming it did at all).

That improved over time, so that a time zone had a set of adjustment rules, each of which would be in force for a certain portion of history. This made it possible to represent the results of the Energy Policy Act of 2005 for example. These are represented in .NET using TimeZoneInfo.AdjustmentRule, which is slightly under-documented and has suffered from some implementation issues in the past. (There’s also the matter of the data used, but I treat that as a different issue.)

Bugs aside, the properties of TimeZoneInfo and its adjustment rules allowed an interested developer (one wanting to expose the same information in a different form for a better date/time API, as one entirely arbitrary example) to predict the results of the calculations within TimeZoneInfo itself – so the value returned by a call to TimeZoneInfo.GetUtcOffset(DateTime) could be predicted by looking at the standard UTC offset of the time zone, working out which rule was in effect for the specified DateTime, working out if that rule means that DST was being observed at the time, and adjusting the result accordingly.

As of .NET 4.6, it appears this will no longer be the case – not in a straightforward manner. One aspect of inflexibility in TimeZoneInfo is being eliminated: the inability to change standard offset. In the past, if a time zone changed its understanding of “standard time” (as the UK did between 1968 and 1971, for example), that couldn’t be cleanly represented in the TimeZoneInfo data model, leading to some very odd data with “backward” DST offsets to model the situation as nearly as possible.

Now, it seems that each adjustment rule also “knows” the difference between its standard offset and that of the zone as a whole. For the most part, this is a good thing. However, it’s a pain for anyone who works with TimeZoneInfo.AdjustmentRule directly, as the information simply isn’t available on the rule. (This is only a CTP of .NET 4.6, of course – it might become available before the final release.)

Fortunately, one can infer the information by asking the zone itself for the UTC offset of one arbitrary point in the adjustment rule, and then compare that with what you’d predict using just the properties of TimeZoneInfo and AdjustmentRule (taking into account the fact that the start of the rule may already be in DST). So long as the rule performs its internal calculations correctly (and from what I’ve seen, it’s now a lot better than it used to be, though not quite perfect yet) we can predict the result of GetUtcOffset for all other DateTime values.

It’s not clear to me why the information isn’t just exposed with a new property on the rule, however. It’s a change in terms of what’s available, sure – but anyone using the new implementation directly would have to know about the change anyway, as the results of using the exposed data no longer match the results of GetUtcOffset.

Example 3: `PersianCalendar` and leap years

If you thought the previous two examples were obscure, you’ll love this one.

PersianCalendar is changing in .NET 4.6 to use a more complicated leap year formula. The currently documented formula is:

A leap year is a year that, when divided by 33, has a remainder of 1, 5, 9, 13, 17, 22, 26, or 30.

So new PersianCalendar().IsLeapYear(1) has always returned true – until now. It turns out that Windows 10 is going to support the Persian Calendar (also known as the Solar Hijri calendar) in certain locales, and it’s going to do so “properly” – by which I mean, with a more complex leap year computation. This is what’s known as the “astronomical” Persian calendar and it follows the algorithm described in Calendrical Calculations. The BCL implementation is going to be consistent with that Windows calendar, which makes sense.

It’s worth noting that this calendar has the same leap year pattern as the “simple” one for over 320 years around the modern era (Gregorian 1800 to Gregorian 2123) so it’s really only people dealing with long-past dates in the Persian calendar who will notice the difference. Of course, I noticed because I have a unit test checking that my implementation of the Persian calendar in Noda Time is equivalent to the BCL’s implementation. It was fine until I installed Visual Studio 2015 CTP6…

As it happens, there’s another Persian calendar to consider – the “arithmetic” or “algorithmic” Persian calendar proposed by Ahmad Birashk. This consists of three hierarchical cycles of years (either cycles, subcycles and sub-subcycles or cycles, grand cycles and great grand cycles depending on whether you start with the biggest kind and work in, or start at the smallest and work out).

For Noda Time 2.0, I’m now going to support all three forms: simple (for those who’d like compatibility with the “old” BCL implementation), astronomical and arithmetic.

Conclusion

Backwards compatibility is hard. In all of these cases there are reasons for the brokenness, whether that’s just brokenness against correctness as in the first case, or a change in behaviour as in the other two. I think for localization-related code, there’s probably a greater tolerance of change – or there should be, as localization data changes reasonably frequently.

For the second and third cases, I think it’s reasonable to say that compatibility has been broken in a reasonable cause – particular for the second case, where the time zone data can be much more sensible with the additional flexibility of changing the UTC offset of standard time over history. It’s just a shame there’s fall-out.

The changes I would make if I were the only developer in the world would be:

Fix the first case either by ignoring broken comparer implementations, or by taking the hit of calling Math.Sign.
Improve the second case by adding a new property to AdjustmentRule and publicising its existence in large, friendly letters.
Introduce a new class for the third case instead of modifying the behaviour of the existing class. That would certainly be best for me – but for most users, that would probably introduce more problems than it solved. (I suspect that most users of the Persian calendar won’t go outside the “safe” range where the simple and astronomical calendars are the same anyway.)

One of the joys of working on Noda Time 2.0 at the moment is that it’s a new major version and I am willing to break 1.x code… not gratuitously, of course, but where there’s good reason. Even so, there are painful choices to be made in some cases, where there’s a balance between a slight ongoing smell or a clean break that might cause issues for some users if they’re not careful. I can only imagine what the pain is like when maintaining a large and mature codebase like the BCL – or the Windows API itself.

24 thoughts on “Backward compatibility pain”

Dave Nay says:

March 2, 2015 at 9:47 pm

“IComparer.Compare is allowed to return any integer value, including int.MinValue. The return value is effectively meant to represent one of three results:
•the first argument is “earlier than” the second (return a negative integer)
•the two arguments are equal in terms of this comparison (return 0)
•the second argument is “later than” the first (return a positive integer)”

Heads I win, tails you lose?

I believe the last bullet point should read “the first argument is “later than” the second (return a positive integer)”

LikeLike

Reply
1. jonskeet says:
  
  March 2, 2015 at 9:50 pm
  
  Yes, fixed – thanks!
  
  LikeLike
  
  Reply
2. Jon Scharff says:
  
  March 3, 2015 at 12:35 am
  
  Switching arguments and negating ^_^
  
  I would imagine that people who use the naive subtraction of integers (which has problems when overflow occurs, if I remember correctly) would run into the first problem enough to counterbalance people who break the IComparable contract…
  
  LikeLike
  
  Reply
Evan Hauck says:

March 2, 2015 at 11:18 pm

About the Math.Sign stuff, I just compiled a test app, VS CTP6 with .NET 4.5 and ctrl-alt-D (which is a shortcut for View->Disassembly or something like that). I’m a bit confused which calls go where, due to optimizations making the debug database not assign nice pretty “this source goes here” comments, and calls are addresses and not symbol names. Reproduction notes, make sure Tools -> Options -> Debugging -> General -> Suppress JIT optimization on module load is unchecked, otherwise Math.Sign doesn’t inline. Anyway, getting onto the result…

It looks like Math.Sign inlines to the following:
013C2E4F test eax,eax
013C2E51 jge 013C2E58
013C2E53 or ecx,0FFFFFFFFh
013C2E56 jmp 013C2E65
013C2E58 test eax,eax
013C2E5A jle 013C2E63
013C2E5C mov ecx,1
013C2E61 jmp 013C2E65
013C2E63 xor ecx,ecx
013C2E65 [next instructions]

So definitely not a simple one-instruction opcode, but definitely not as expensive as a function call. For the curious, here’s my attempt at translating the above into pseudocode:
x >= 0 ? (x < 0 ? (1) : (x ^ x)) : (x | 0xFFFFFFFF)
Note that (x ^ x) is a quick way of setting to zero

For the even more curious, here’s the reference source for the method: http://referencesource.microsoft.com/#mscorlib/system/math.cs,637

LikeLiked by 1 person

Reply
pete.d says:

March 3, 2015 at 1:12 am

Thanks for the insightful article. Backward compatibility is hard, at least when changes are due. I will point out that as far as IComparer goes, returning the more conventional -1 and 1 for the non-equal results does avoid the negation issue. It is however unfortunate that the implementation of the comparer in this situation follows — and succumbs to! — exactly the broken pattern that Eric Lippert warned about here: http://ericlippert.com/2011/01/24/bad-comparisons-part-two/

The Microsoft response about not swapping the arguments is just silly. Even if that’s a genuine concern (and I don’t think it should be), there’s no reason that they couldn’t reverse a comparison simply by expressly returning -1 for positive comparer results and 1 for negative comparer results (without the use of Math.Sign()). So there are two legitimate alternatives besides the obvious “reverse the order of arguments”. In any case, given the well-known hazard of simply negating the previous comparison result, it’s ridiculous for Microsoft to have implemented it this way.

Not to hijack the “backward” aspect of the article, but another glaring example of compatibility issue in .NET and Windows generally is the horrible choice to pattern the newer XAML-based APIs on Silverlight rather than WPF. Even if Silverlight was the right thing to do for performance reasons in the original Phone implementation, advancement of hardware was an easily predicted eventuality and it would have made so much more sense to use WPF in the modern Phone and Windows RT/Windows App APIs, to facilitate easier cross-platform implementations across all WIndows platforms.

Sigh…

LikeLiked by 1 person

Reply
khyperia says:

March 3, 2015 at 3:29 am

Code tested on: VS CTP6, .NET 4.5, release mode. VS option Tools -> Options -> Debugging -> General -> Suppress JIT optimization on module load disabled. Disassembly obtained by Debug -> Windows -> Disassembly while debugging.

I decided to dive into your comment of “which may well be JIT-compiled into a single machine instruction – I haven’t checked”, and here are the results:

Compiling x = Math.Sign(x) results in this assembly:

010D2E4F test eax,eax
010D2E51 jge 010D2E58
010D2E53 or ecx,0FFFFFFFFh
010D2E56 jmp 010D2E65
010D2E58 test eax,eax
010D2E5A jle 010D2E63
010D2E5C mov ecx,1
010D2E61 jmp 010D2E65
010D2E63 xor ecx,ecx
010D2E65 [The rest of the code]

So definitely not just a single instruction, but much faster than a whole function call (as long as it’s inlined). I don’t have a great understanding of CPU runtimes, but I recall that branching is slow in modern processors, and there’s a fair bit of branching in that code.

LikeLike

Reply
1. voo says:
  
  March 15, 2015 at 1:17 pm
  
  Unpredictable branches are bad, which yes makes that code rather horrible. An optimized implementation would be something like:
  
  mov ecx, eax
  sar eax, 31
  neg ecx
  shr ecx, 31
  or ecx, eax
  
  Five cheap instructions, although not exactly independent, in any case incredibly cheap (maybe two more for spilling and restoring eax).
  
  LikeLike
  
  Reply
dotnetchris says:

March 3, 2015 at 4:58 pm

When you break 1.x code please make sure to leave the signatures and obsolete them with proper instructions on how to upgrade. This is a very important contract that few developers adhere to. They just make breaking changes with no communication on how to fix them. Especially by inserting breaking changes to the output of an existing method for a valid input. Unless you are actually fixing an error in a method if F(X) is 5, it better always be 5. If for reasons other than errors F(X) will return 6, F(X) really needs obsoleted and instructions to use G(X) along with an explanation of why and what is different.

LikeLike

Reply
1. jonskeet says:
  
  March 3, 2015 at 5:08 pm
  
  No, I won’t be leaving existing signatures – 2.0 is expected to include fully breaking changes. That’s the point of it being a major release – it doesn’t have to be backwardly compatible.
  However, there will be a 1.4 release which obsoletes (where feasible) members that will be removed in 2.0, along with migration instructions. I’m writing up a migration guide for 2.0 as I go along: http://nodatime.org/unstable/userguide/migration-to-2.html
  
  LikeLike
  
  Reply
  1. dotnetchris says:
    
    March 3, 2015 at 11:13 pm
    
    You really should have the obsoleted methods for 1 whole major version. 1.4 should deprecate them, 2.0 obsolete them, 3.0 remove obsoleted code.
    
    You’re going to have users that do in place upgrades from 1.x to 2.x and just be lost. Sure maybe they’ll find your blog or whatever but that’s a very bad user experience. Most users assume the ability to upgrade 1 version. A user would be far interrogative of an upgrade from 1.x to 3.x or beyond. No normal user will ever decide to go from 1.3 to 1.4 then 2.0, they will go 1.3 to 2.0.
    
    LikeLike
    
    Reply
    1. jonskeet says:
      
      March 4, 2015 at 12:12 am
      
      I think we’ll have to agree to disagree. What I’ve suggested is entirely compliant with semantic versioning. If people want to stick with the 1.x codebase, they’re very welcome to do so – but a major version number is definitely allowed to be a breaking change… And indeed if I didn’t make any breaking changes, I wouldn’t change the major version number.
      
      LikeLike
      
      Reply
      1. dotnetchris says:
        
        March 4, 2015 at 11:46 pm
        
        I didn’t say it was disallowed by proper Semantic Versioning. I am stating it is a failure of User Experience. Your breaking changes are undiscoverable if they are not obsoleted similar to the fashion i outlined.
        
        LikeLike
        
        Reply
        
        jonskeet says:
        
        March 5, 2015 at 6:16 am
        
        They’re discoverable by: a) being announced; b) being documented; c) there being a 1.4 which isn’t a breaking change. Your suggestion of “don’t break in one major version” basically goes against semver. The very start of semver.org says: “Given a version number MAJOR.MINOR.PATCH, increment the MAJOR version when you make incompatible API changes,” That’s exactly what I’m doing. Note that in point 7 of the details, deprecation is seen as something which increments the minor number, not the major number.
        
        See also the section “How should I handle deprecating functionality?” which concurs with the way I’m intending to do this.
        
        It seems pretty simple to me: if you take a new major version of something, you should expect breaking changes, because that’s what a change in major version number is intended to convey.
        
        LikeLike
        
        Reply
        
        David Sherret says:
        
        March 5, 2015 at 7:44 pm
        
        You’re definitely correct, Jon. Leaving obsolete signatures from a past major release actually harms the developer experience because obsolete methods still appear in intellisense.
        
        LikeLike
Eric Rasmussen says:

March 4, 2015 at 6:37 pm

Great post – I hadn’t even heard of the timezone adjustment rule before. It’s unfortunate that backward compatibility takes precedence over correctness, but from a business perspective I can understand it. There’s stuff in the Win32 API that’s still “broken” because it was mis-used by the Office team… Fixing the API would break Office, which is bad but could be patched, but it would also break any other applications that depended on the [wrong] behavior of the API. Breaking your customers’ products by patching an API is not good.

At my company we have support products released in the past as well as current versions, and we have to release update modules that work across all versions in the field. It’s a pain, and it prevents us from correcting some bad API names and weird behaviors. The only way to really fix it is to clean-slate and release a new version that is no longer compatible with existing versions.

Anyhow, thanks for the post! It’s well-written and thought provoking. :)

LikeLike

Reply
Pingback: Les liens de la semaine – Édition #122 | French Coding
voo says:

March 15, 2015 at 1:09 pm

I think Joshua Bloch said something along the lines of not sacrificing correctness and ease of use for performance when designing APIs, because computers and compilers get better and faster, but you’ll be stuck with the broken API forever.

The same thing applies to #1. While I can see why they wouldn’t want to break code by swapping arguments, not using Math.Sign() to guarantee correctness at the cost of some performance is just a plain horrible decision.

Also the performance cost should be pretty negligible if the Math.Sign function was well optimized (seeing the code above it isn’t though). Assuming you can use unsigned integers this should be relatively efficient on any hardware: “sign = x >> 31 | (int)((uint) -x >> 31)”. Another option would be to use “(x > 0) – (x < 0)” [need the compiler to do an efficient/implicit bool to int conversion) which could be more efficient on some hardware.

LikeLike

Reply
marcbrooks says:

March 24, 2015 at 8:17 pm

If you want to see the most backward of all backward compatible arguments, you should see the (now dead) discussion of String.GetHashCode on Connect… which I failed to capture before Microsoft wiped all references to it. Lucky for us, the complete lack of logic is STILL not dead, see https://github.com/dotnet/coreclr/pull/229

LikeLike

Reply
Ivan Stoev says:

April 9, 2015 at 10:56 am

Jon, LINQ is not the only one that suffers with a comparer returning int.MinValue. ReverseComparer (used by PLINQ descending order implementation) and System.Collections.Comparer (used by Comparer.Default when T is not an IComparable type) from reference sources (http://referencesource.microsoft.com) use similar technique to invert the result of a comparison, and of course there could be many more inside the framework. So I guess although any negative value including int.MinValue is a valid implementation of a comparison function, a good comparison function should always return -1, 0 or 1. Or at least should not return int.MinValue :-)

LikeLike

Reply
Matt Johnson says:

April 21, 2015 at 5:58 am

I raised an issue for the adjustment rule.
https://github.com/dotnet/corefx/issues/1450

LikeLike

Reply
1. jonskeet says:
  
  April 21, 2015 at 6:03 am
  
  Many thanks :) Not that we’d be able to use it in Noda Time anyway as we’re tied to an earlier version, but it’s still the right thing to do :)
  
  LikeLike
  
  Reply
Pingback: Framework development is hard | SmartFlow
Thomas Hansen says:

June 23, 2017 at 9:01 am

Yet again, problem solved, and it’s created on top of the Microsoft stack, with .Net, C#, etc … :)

LikeLike

Reply
Pingback: Yellow soap Opera blog :: Why Freelance Software Developers Should not use the Java Programming Language, rant_x01