The perils of conditional mutability

This morning I was wrestling with trying to make some Noda Time unit tests faster. For some reason, the continuous integration host we’re using is really slow at loading resources under .NET 4. The unit tests which run in 10 seconds on my home laptop take over three hours on the continuous integration system. Taking stack traces at regular intervals showed the problem was with the NodaFormatInfo constructor, which reads some resources.

I may look into streamlining the resource access later, but before we get to that point, I wanted to try to reduce the number of times we call that constructor in the first place. NodaFormatInfo is meant to be cached, so I wouldn’t have expected thousands of instances to be created – but it’s only cached when the System.Globalization.CultureInfo it’s based on is read-only. This is where the problems start…

CultureInfo is conditionally mutable (not an official term, just one I’ve coined for the purposes of this post). You can ask whether or not it’s read-only with the IsReadOnly property, and obviously if it’s read-only you can’t change it. Additionally, CultureInfo is composed of other conditionally mutable objects – DateTimeFormatInfo, NumberFormatInfo etc. There’s a static ReadOnly method on CultureInfo to create a read-only wrapper around a mutable CultureInfo. It’s not clearly documented whether that’s expected to take a deep copy (so that callers can really rely on it not changing) or whether it’s expected to reflect any further changes made to the culture info it’s based on. To go in the other direction, you can call Clone on a CultureInfo to create a mutable copy of any existing culture.

Further complications are introduced by the properties on the composite objects – we have properties such as DateTimeFormatInfo.MonthNames which returns a string array. Remember, arrays are always mutable. So it’s really important to know whether the array reference returned from the property refers to a copy of the underlying data, or whether it refers to the array that’s used internally by the type. Obviously for read-only DateTimeFormatInfo objects, I’d expect a copy to be returned – but for a mutable DateTimeFormatInfo, it would potentially make sense to return the underlying array reference. Unfortunately, the documentation doesn’t make this clear – but in practice, it always returns a copy. If you need to change the month names, you need to clone the array, mutate the clone, and then set the MonthNames property.

All of this makes CultureInfo hard to work with. The caching decision earlier on only really works if a "read-only" culture genuinely won’t change behind the scenes. The type system gives you no help to catch subtle bugs at compile-time. Making any of this robust but efficient (in terms of taking minimal copies) is tricky to say the least.

Not only does it make it hard to work with from a client’s point of view, but apparently it’s hard to implement correctly too…

First bug: Mono’s invariant culture isn’t terribly invariant…

(Broken in 2.10.8; apparently fixed later.)

I discovered this while getting Noda Time’s unit tests to pass on Mono. Unfortunately there are some I’ve had to effectively disable at the moment, due to deficiencies in Mono (some of which are being fixed, of course).

Here’s a program which builds a clone of the invariant culture, changes the clone’s genitive month names, and then prints out the first non-genitive name from the plain invariant culture:

using System;
using System.Globalization;

class Test
{
    static void Main()
    {        
        CultureInfo clone = (CultureInfo) CultureInfo.InvariantCulture.Clone();
        // Note: not even deliberately changing MonthNames for this culture!
        clone.DateTimeFormat.MonthGenitiveNames[0] = "Changed";
        
        // Prints Changed
        Console.WriteLine(CultureInfo.InvariantCulture.DateTimeFormat.MonthNames[0]);
    }
}

I believe this bug is really due to the lack of support for genitive month names in Mono at the moment – the MonthGenitiveNames property always just returns a reference to the month names for the invariant culture – without taking a copy first. (It always returns the invariant culture’s month names, even if you’re using a different culture entirely.) The code above shows an "innocent" attempt to change a mutable clone – but in reality we could have used any culture (including an immutable one) to make the change.

Note that in the .NET implementation, the change would only have been to a copy of the underlying data, so even the clone wouldn’t have reflected change anywhere.

Second bug: ReadOnly losing changes

The second bug is the one I found this morning. It looks like it’s fixed in .NET 4, but it’s present in .NET 3.5, which is where it bit me this morning. When you try to make a read-only wrapper around a mutated culture, some of the properties are preserved… but some aren’t:

using System;
using System.Globalization;

class Test
{
    static void Main()
    {
        CultureInfo clone = (CultureInfo) CultureInfo.InvariantCulture.Clone();
        clone.DateTimeFormat.AMDesignator = "ChangedAm";

        // The array is recreated on each call to MonthNames, so changing the
        // value within the array itself doesn’t work :(
        string[] months = (string[]) clone.DateTimeFormat.MonthNames;
        months[0] = "ChangedMonth";
        clone.DateTimeFormat.MonthNames = months;
        
        CultureInfo readOnlyCopy = CultureInfo.ReadOnly(clone);
        Console.WriteLine(clone.DateTimeFormat.AMDesignator); // ChangedAm
        Console.WriteLine(clone.DateTimeFormat.MonthNames[0]); // ChangedMonth
                
        Console.WriteLine(readOnlyCopy.DateTimeFormat.AMDesignator); // ChangedAm
        Console.WriteLine(readOnlyCopy.DateTimeFormat.MonthNames[0]); // January (!)
    }
}

I don’t know what’s going on here. In the end I just left the test code using the mutable clone, having added a comment explaining why it wasn’t created a read-only wrapper.

I’ve experimented with a few different options here, including setting the MonthNames property on the clone after creating the wrapper. No joy – I simply can’t make the new month names stick in the read-only copy. <sigh>

Conclusion

I’ve been frustrated by the approach we’ve taken to cultures in Noda Time for a while. I haven’t worked out exactly what we should do about it yet, so it’s likely to stay somewhat annoying for v1, but I may well revisit it significantly for v2. Unfortunately, there’s nothing I can do about CultureInfo itself.

What I would have preferred in all of this is the builder pattern: make CultureInfo, DateTimeFormatInfo etc all immutable, but give each of them mutable builder types, with the ability to create a mutable builder based on an existing immutable instance, and obviously to create a new immutable instance from a builder. That would make all kinds of things simpler – including caching.

For the moment though, I hope we can all learn lessons from this – or have old lessons reinforced, at least:

  • Making a single type behave in different ways based on different "modes" makes it hard to use correctly. (Yes, this is the same first conclusion as with DateTime in the previous post. Funny, that.)
  • Immutability has to be deep to be meaningful: it’s not much use having a supposedly read-only object which composes a StringBuilder…
  • Arrays should be considered somewhat harmful. If you’re going to return an array from a method, make sure you document whether this is a copy of the underlying data, or a "live" reference. (The latter should be very rare, particularly for a public API.) The exception here is if you return an empty array – that’s effectively immutable, so you can always return it with no problems.
  • The builder pattern rocks – use it!

In my next post I’ll try to get back to the TimeZoneInfo oddities I mentioned last time.

More fun with DateTime

(Note that this is deliberately not posted in the Noda Time blog. I reckon it’s of wider interest from a design perspective, and I won’t be posting any of the equivalent Noda Time code. I’ll just say now that we don’t have this sort of craziness in Noda Time, and leave it at that…)

A few weeks ago, I was answering a Stack Overflow question when I noticed an operation around dates and times which should have been losing information apparently not doing so. I investigated further, and discovered some "interesting" aspects of both DateTime and TimeZoneInfo. In an effort to keep this post down to a readable length (at least for most readers; certain WebDriver developers who shall remain nameless have probably given up by now already) I’ll save the TimeZoneInfo bits for another post.

Background: daylight saving transitions and ambiguous times

There’s one piece of inherent date/time complexity you’ll need to understand for this post to make sense: sometimes, a local date/time occurs twice. For the purposes of this post, I’m going to assume you’re in the UK time zone. On October 28th 2012, at 2am local time (1am UTC), UK clocks will go back to 1am local time. So 1:20am local time occurs twice – once at 12:20am UTC (in daylight saving time, BST), and once at 1:20am UTC (in standard time, GMT).

If you want to run any of the code in this post and you’re not in the UK, please adjust the dates and times used to a similar ambiguity for when your clocks go back. If you happen to be in a time zone which doesn’t observe daylight savings, I’m afraid you’ll have to adjust your system time zone in order to see the effect for yourself.

DateTime.Kind and conversions

As you may already know, as of .NET 2.0, DateTime has a Kind property, of type DateTimeKind – an enum with the following values:

  • Local: The DateTime is considered to be in the system time zone. Not an arbitrary "local time in some time zone", but in the specific current system time zone.
  • Utc: The DateTime is considered to be in UTC (corollary: it always unambiguously represents an instant in time)
  • Unspecified: This means different things in different contexts, but it’s a sort of "don’t know" kind; this is closer to "local time in some time zone" which is represented as LocalDateTime in Noda Time.

DateTime provides three methods to convert between the kinds:

  • ToUniversalTime: if the original kind is Local or Unspecified, convert it from local time to universal time in the system time zone. If the original kind is Utc, this is a no-op.
  • ToLocalTime: if the original kind is Utc or Unspecified, convert it from UTC to local time. If the original kind is Local, this is a no-op.
  • SpecifyKind: keep the existing date/time, but just change the kind. (So 7am stays as 7am, but it changes the meaning of that 7am effectively.)

(Prior to .NET 2.0, ToUniversalTime and ToLocalTime were already present, but always assumed the original value needed conversion – so if you called x.ToLocalTime().ToLocalTime().ToLocalTime() the result would probably end up with the appropriate offset from UTC being applied three times!)

Of course, none of these methods change the existing value – DateTime is immutable, and a value type – instead, they return a new value.

DateTime’s Deep Dark Secret

(The code in this section is presented in several chunks, but it forms a single complete piece of code – later chunks refer to variables in earlier chunks. Put it all together in a Main method to run it.)

Armed with the information in the previous sections, we should be able to make DateTime lose data. If we start with 12:20am UTC and 1:20am UTC on October 28th as DateTimes with a kind of Utc, when we convert them to local time (on a system in the UK time zone) we should get 1:20am in both cases due to the daylight saving transition. Indeed, that works:

// Start with different UTC values around a DST transition
var original1 = new DateTime(2012, 10, 28, 0, 20, 0, DateTimeKind.Utc);
var original2 = new DateTime(2012, 10, 28, 1, 20, 0, DateTimeKind.Utc);

// Convert to local time
var local1 = original1.ToLocalTime();
var local2 = original2.ToLocalTime();

// Result is the same for both values. Information loss?
var expected = new DateTime(2012, 10, 28, 1, 20, 0, DateTimeKind.Local);
Console.WriteLine(local1 == expected); // True
Console.WriteLine(local2 == expected); // True
Console.WriteLine(local1 == local2);   // True

If we’ve started with two different values, applied the same operation to both, and ended up with equal values, then we must have lost information, right? That doesn’t mean that operation is "bad" any more than "dividing by 2" is bad. You ought to be aware of that information loss, that’s all.

So, we ought to be able to demonstrate that information loss further by converting back from local time to universal time. Here we have the opposite problem: from our local time of 1:20am, we have two valid universal times we could convert to – either 12:20am UTC or 1:20am UTC. Both answers would be correct – they are universal times at which the local time would be 1:20am. So which one will get picked? Well… here’s the surprising bit:

// Convert back to UTC
var roundTrip1 = local1.ToUniversalTime(); 
var roundTrip2 = local2.ToUniversalTime();

// Values round-trip correctly! Information has been recovered…
Console.WriteLine(roundTrip1 == original1);  // True
Console.WriteLine(roundTrip2 == original2);  // True
Console.WriteLine(roundTrip1 == roundTrip2); // False

Somehow, each of the local values knows which universal value it came from. The The information has been recovered, so the reverse conversion round-trips each value back to its original one. How is that possible?

It turns out that DateTime actually has four potential kinds: Local, Utc, Unspecified, and "local but treat it as the earlier option when resolving ambiguity". A DateTime is really just a 64-bit number of ticks, but because the range of DateTime is only January 1st 0001 to December 31st 9999. That range can be represented in 62 bits, leaving 2 bits "spare" to represent the kind. 2 bits gives 4 possible values… the three documented ones and the shadowy extra one.

Through experimentation, I’ve discovered that the kind is preserved if you perform arithmetic on the value, too… so if you go to another "fall back" DST transition such as October 30th 2011, the ambiguity resolution works the same way as before:

var local3 = local1.AddYears(-1).AddDays(2); 
var local4 = local2.AddYears(-1).AddDays(2);        
Console.WriteLine(local3.ToUniversalTime().Hour); // 0
Console.WriteLine(local4.ToUniversalTime().Hour); // 1

If you use DateTime.SpecifyKind with DateTimeKind.Local, however, it goes back to the "normal" kind, even though it looks like it should be a no-op:

// Should be a no-op?
var local5 = DateTime.SpecifyKind(local1, local1.Kind); 
Console.WriteLine(local5.ToUniversalTime().Hour); // 1

Is this correct behaviour? Or should it be a no-op, just like calling ToLocalTime on a "local" DateTime is? (Yes, I’ve checked – that doesn’t lose the information.) It’s hard to say, really, as this whole business appears to be undocumented… at least, I haven’t seen anything in MSDN about it. (Please add a link in the comments if you find something. The behaviour actually goes against what’s documented, as far as I can tell.)

I haven’t looked into whether various forms of serialization preserve values like this faithfully, by the way – but you’d have to work hard to reproduce it in non-framework code. You can’t explicitly construct a DateTime with the "extra" kind; the only ways I know of to create such a value are via a conversion to local time or through arithmetic on a value which already has the kind. (Admittedly if you’re serializing a DateTime with a Kind of Local, you’re already on potentially shaky ground, given that you could be deserializing it on a machine with a different system time zone.)

Unkind comparisons

I’ve misled you a little, I have to admit. In the code above, when I compared the "expected" value with the results of the first conversions, I deliberately specified DateTimeKind.Local in the constructor call. After all, that’s the kind we do expect. Well, yes – but I then printed the result of comparing this value with local1 and local2… and those comparisons would have been the same regardless of the kind I’d specified in the constructor.

All comparisons between DateTimes ignore the Kind property. It’s not just restricted to equality. So for example, consider this comparison:

// In June: Local time is UTC+1, so 8am UTC is 9am local
var dt1 = new DateTime(2012, 6, 1, 8, 0, 0, DateTimeKind.Utc); 
var dt2 = new DateTime(2012, 6, 1, 8, 30, 0, DateTimeKind.Local); 
Console.WriteLine(dt1 < dt2); // True

When viewed in terms of "what instants in time do these both represent?" the answer here is wrong – when you convert both values into the same time zone (in either direction), dt1 occurs after dt2. But a simple look at the properties tells a different story. In practice, I suspect that most comparisons between DateTime values of different kinds involve code which is at best sloppy and is quite possibly broken in a meaningful way.

Of course, if you bring Kind=Unspecified into the picture, it becomes impossible to compare meaningfully in a kind-sensitive way. Is 12am UTC before or after 1am Unspecified? It depends what time zone you later use.

To be clear, it is a hard-to-resolve issue, and one that we don’t do terribly well at in Noda Time at the moment for ZonedDateTime. (And even with just LocalDateTime you’ve got issues between calendars.) This is a situation where providing separate Comparer<T> implementations works nicely – so you can explicitly say what kind of comparison you want.

Conclusions

There’s more fun to be had with a similar situation when we look at TimeZoneInfo, but for now, a few lessons:

  • Giving a type different "modes" which make it mean fairly significantly different things is likely to cause headaches
  • Keeping one of those modes secret (and preventing users from even constructing a value in that mode directly) leads to even more fun and games
  • If two instances of your type are considered "equal" but behave differently, you should at least consider whether there’s something smelly going on
  • There’s always more fun to be had with DateTime…