This morning I was wrestling with trying to make some Noda Time unit tests faster. For some reason, the continuous integration host we’re using is really slow at loading resources under .NET 4. The unit tests which run in 10 seconds on my home laptop take over three hours on the continuous integration system. Taking stack traces at regular intervals showed the problem was with the NodaFormatInfo constructor, which reads some resources.
I may look into streamlining the resource access later, but before we get to that point, I wanted to try to reduce the number of times we call that constructor in the first place. NodaFormatInfo is meant to be cached, so I wouldn’t have expected thousands of instances to be created – but it’s only cached when the System.Globalization.CultureInfo it’s based on is read-only. This is where the problems start…
CultureInfo is conditionally mutable (not an official term, just one I’ve coined for the purposes of this post). You can ask whether or not it’s read-only with the IsReadOnly property, and obviously if it’s read-only you can’t change it. Additionally, CultureInfo is composed of other conditionally mutable objects – DateTimeFormatInfo, NumberFormatInfo etc. There’s a static ReadOnly method on CultureInfo to create a read-only wrapper around a mutable CultureInfo. It’s not clearly documented whether that’s expected to take a deep copy (so that callers can really rely on it not changing) or whether it’s expected to reflect any further changes made to the culture info it’s based on. To go in the other direction, you can call Clone on a CultureInfo to create a mutable copy of any existing culture.
Further complications are introduced by the properties on the composite objects – we have properties such as DateTimeFormatInfo.MonthNames which returns a string array. Remember, arrays are always mutable. So it’s really important to know whether the array reference returned from the property refers to a copy of the underlying data, or whether it refers to the array that’s used internally by the type. Obviously for read-only DateTimeFormatInfo objects, I’d expect a copy to be returned – but for a mutable DateTimeFormatInfo, it would potentially make sense to return the underlying array reference. Unfortunately, the documentation doesn’t make this clear – but in practice, it always returns a copy. If you need to change the month names, you need to clone the array, mutate the clone, and then set the MonthNames property.
All of this makes CultureInfo hard to work with. The caching decision earlier on only really works if a "read-only" culture genuinely won’t change behind the scenes. The type system gives you no help to catch subtle bugs at compile-time. Making any of this robust but efficient (in terms of taking minimal copies) is tricky to say the least.
Not only does it make it hard to work with from a client’s point of view, but apparently it’s hard to implement correctly too…
First bug: Mono’s invariant culture isn’t terribly invariant…
(Broken in 2.10.8; apparently fixed later.)
I discovered this while getting Noda Time’s unit tests to pass on Mono. Unfortunately there are some I’ve had to effectively disable at the moment, due to deficiencies in Mono (some of which are being fixed, of course).
Here’s a program which builds a clone of the invariant culture, changes the clone’s genitive month names, and then prints out the first non-genitive name from the plain invariant culture:
using System.Globalization;
class Test
{
static void Main()
{
CultureInfo clone = (CultureInfo) CultureInfo.InvariantCulture.Clone();
// Note: not even deliberately changing MonthNames for this culture!
clone.DateTimeFormat.MonthGenitiveNames[0] = "Changed";
// Prints Changed
Console.WriteLine(CultureInfo.InvariantCulture.DateTimeFormat.MonthNames[0]);
}
}
I believe this bug is really due to the lack of support for genitive month names in Mono at the moment – the MonthGenitiveNames property always just returns a reference to the month names for the invariant culture – without taking a copy first. (It always returns the invariant culture’s month names, even if you’re using a different culture entirely.) The code above shows an "innocent" attempt to change a mutable clone – but in reality we could have used any culture (including an immutable one) to make the change.
Note that in the .NET implementation, the change would only have been to a copy of the underlying data, so even the clone wouldn’t have reflected change anywhere.
Second bug: ReadOnly losing changes
The second bug is the one I found this morning. It looks like it’s fixed in .NET 4, but it’s present in .NET 3.5, which is where it bit me this morning. When you try to make a read-only wrapper around a mutated culture, some of the properties are preserved… but some aren’t:
using System.Globalization;
class Test
{
static void Main()
{
CultureInfo clone = (CultureInfo) CultureInfo.InvariantCulture.Clone();
clone.DateTimeFormat.AMDesignator = "ChangedAm";
// The array is recreated on each call to MonthNames, so changing the
// value within the array itself doesn’t work :(
string[] months = (string[]) clone.DateTimeFormat.MonthNames;
months[0] = "ChangedMonth";
clone.DateTimeFormat.MonthNames = months;
CultureInfo readOnlyCopy = CultureInfo.ReadOnly(clone);
Console.WriteLine(clone.DateTimeFormat.AMDesignator); // ChangedAm
Console.WriteLine(clone.DateTimeFormat.MonthNames[0]); // ChangedMonth
Console.WriteLine(readOnlyCopy.DateTimeFormat.AMDesignator); // ChangedAm
Console.WriteLine(readOnlyCopy.DateTimeFormat.MonthNames[0]); // January (!)
}
}
I don’t know what’s going on here. In the end I just left the test code using the mutable clone, having added a comment explaining why it wasn’t created a read-only wrapper.
I’ve experimented with a few different options here, including setting the MonthNames property on the clone after creating the wrapper. No joy – I simply can’t make the new month names stick in the read-only copy. <sigh>
Conclusion
I’ve been frustrated by the approach we’ve taken to cultures in Noda Time for a while. I haven’t worked out exactly what we should do about it yet, so it’s likely to stay somewhat annoying for v1, but I may well revisit it significantly for v2. Unfortunately, there’s nothing I can do about CultureInfo itself.
What I would have preferred in all of this is the builder pattern: make CultureInfo, DateTimeFormatInfo etc all immutable, but give each of them mutable builder types, with the ability to create a mutable builder based on an existing immutable instance, and obviously to create a new immutable instance from a builder. That would make all kinds of things simpler – including caching.
For the moment though, I hope we can all learn lessons from this – or have old lessons reinforced, at least:
- Making a single type behave in different ways based on different "modes" makes it hard to use correctly. (Yes, this is the same first conclusion as with DateTime in the previous post. Funny, that.)
- Immutability has to be deep to be meaningful: it’s not much use having a supposedly read-only object which composes a StringBuilder…
- Arrays should be considered somewhat harmful. If you’re going to return an array from a method, make sure you document whether this is a copy of the underlying data, or a "live" reference. (The latter should be very rare, particularly for a public API.) The exception here is if you return an empty array – that’s effectively immutable, so you can always return it with no problems.
- The builder pattern rocks – use it!
In my next post I’ll try to get back to the TimeZoneInfo oddities I mentioned last time.