The perils of conditional mutability

This morning I was wrestling with trying to make some Noda Time unit tests faster. For some reason, the continuous integration host we’re using is really slow at loading resources under .NET 4. The unit tests which run in 10 seconds on my home laptop take over three hours on the continuous integration system. Taking stack traces at regular intervals showed the problem was with the NodaFormatInfo constructor, which reads some resources.

I may look into streamlining the resource access later, but before we get to that point, I wanted to try to reduce the number of times we call that constructor in the first place. NodaFormatInfo is meant to be cached, so I wouldn’t have expected thousands of instances to be created – but it’s only cached when the System.Globalization.CultureInfo it’s based on is read-only. This is where the problems start…

CultureInfo is conditionally mutable (not an official term, just one I’ve coined for the purposes of this post). You can ask whether or not it’s read-only with the IsReadOnly property, and obviously if it’s read-only you can’t change it. Additionally, CultureInfo is composed of other conditionally mutable objects – DateTimeFormatInfo, NumberFormatInfo etc. There’s a static ReadOnly method on CultureInfo to create a read-only wrapper around a mutable CultureInfo. It’s not clearly documented whether that’s expected to take a deep copy (so that callers can really rely on it not changing) or whether it’s expected to reflect any further changes made to the culture info it’s based on. To go in the other direction, you can call Clone on a CultureInfo to create a mutable copy of any existing culture.

Further complications are introduced by the properties on the composite objects – we have properties such as DateTimeFormatInfo.MonthNames which returns a string array. Remember, arrays are always mutable. So it’s really important to know whether the array reference returned from the property refers to a copy of the underlying data, or whether it refers to the array that’s used internally by the type. Obviously for read-only DateTimeFormatInfo objects, I’d expect a copy to be returned – but for a mutable DateTimeFormatInfo, it would potentially make sense to return the underlying array reference. Unfortunately, the documentation doesn’t make this clear – but in practice, it always returns a copy. If you need to change the month names, you need to clone the array, mutate the clone, and then set the MonthNames property.

All of this makes CultureInfo hard to work with. The caching decision earlier on only really works if a "read-only" culture genuinely won’t change behind the scenes. The type system gives you no help to catch subtle bugs at compile-time. Making any of this robust but efficient (in terms of taking minimal copies) is tricky to say the least.

Not only does it make it hard to work with from a client’s point of view, but apparently it’s hard to implement correctly too…

First bug: Mono’s invariant culture isn’t terribly invariant…

(Broken in 2.10.8; apparently fixed later.)

I discovered this while getting Noda Time’s unit tests to pass on Mono. Unfortunately there are some I’ve had to effectively disable at the moment, due to deficiencies in Mono (some of which are being fixed, of course).

Here’s a program which builds a clone of the invariant culture, changes the clone’s genitive month names, and then prints out the first non-genitive name from the plain invariant culture:

using System;
using System.Globalization;

class Test
{
    static void Main()
    {        
        CultureInfo clone = (CultureInfo) CultureInfo.InvariantCulture.Clone();
        // Note: not even deliberately changing MonthNames for this culture!
        clone.DateTimeFormat.MonthGenitiveNames[0] = "Changed";
        
        // Prints Changed
        Console.WriteLine(CultureInfo.InvariantCulture.DateTimeFormat.MonthNames[0]);
    }
}

I believe this bug is really due to the lack of support for genitive month names in Mono at the moment – the MonthGenitiveNames property always just returns a reference to the month names for the invariant culture – without taking a copy first. (It always returns the invariant culture’s month names, even if you’re using a different culture entirely.) The code above shows an "innocent" attempt to change a mutable clone – but in reality we could have used any culture (including an immutable one) to make the change.

Note that in the .NET implementation, the change would only have been to a copy of the underlying data, so even the clone wouldn’t have reflected change anywhere.

Second bug: ReadOnly losing changes

The second bug is the one I found this morning. It looks like it’s fixed in .NET 4, but it’s present in .NET 3.5, which is where it bit me this morning. When you try to make a read-only wrapper around a mutated culture, some of the properties are preserved… but some aren’t:

using System;
using System.Globalization;

class Test
{
    static void Main()
    {
        CultureInfo clone = (CultureInfo) CultureInfo.InvariantCulture.Clone();
        clone.DateTimeFormat.AMDesignator = "ChangedAm";

        // The array is recreated on each call to MonthNames, so changing the
        // value within the array itself doesn’t work :(
        string[] months = (string[]) clone.DateTimeFormat.MonthNames;
        months[0] = "ChangedMonth";
        clone.DateTimeFormat.MonthNames = months;
        
        CultureInfo readOnlyCopy = CultureInfo.ReadOnly(clone);
        Console.WriteLine(clone.DateTimeFormat.AMDesignator); // ChangedAm
        Console.WriteLine(clone.DateTimeFormat.MonthNames[0]); // ChangedMonth
                
        Console.WriteLine(readOnlyCopy.DateTimeFormat.AMDesignator); // ChangedAm
        Console.WriteLine(readOnlyCopy.DateTimeFormat.MonthNames[0]); // January (!)
    }
}

I don’t know what’s going on here. In the end I just left the test code using the mutable clone, having added a comment explaining why it wasn’t created a read-only wrapper.

I’ve experimented with a few different options here, including setting the MonthNames property on the clone after creating the wrapper. No joy – I simply can’t make the new month names stick in the read-only copy. <sigh>

Conclusion

I’ve been frustrated by the approach we’ve taken to cultures in Noda Time for a while. I haven’t worked out exactly what we should do about it yet, so it’s likely to stay somewhat annoying for v1, but I may well revisit it significantly for v2. Unfortunately, there’s nothing I can do about CultureInfo itself.

What I would have preferred in all of this is the builder pattern: make CultureInfo, DateTimeFormatInfo etc all immutable, but give each of them mutable builder types, with the ability to create a mutable builder based on an existing immutable instance, and obviously to create a new immutable instance from a builder. That would make all kinds of things simpler – including caching.

For the moment though, I hope we can all learn lessons from this – or have old lessons reinforced, at least:

  • Making a single type behave in different ways based on different "modes" makes it hard to use correctly. (Yes, this is the same first conclusion as with DateTime in the previous post. Funny, that.)
  • Immutability has to be deep to be meaningful: it’s not much use having a supposedly read-only object which composes a StringBuilder…
  • Arrays should be considered somewhat harmful. If you’re going to return an array from a method, make sure you document whether this is a copy of the underlying data, or a "live" reference. (The latter should be very rare, particularly for a public API.) The exception here is if you return an empty array – that’s effectively immutable, so you can always return it with no problems.
  • The builder pattern rocks – use it!

In my next post I’ll try to get back to the TimeZoneInfo oddities I mentioned last time.

6 thoughts on “The perils of conditional mutability”

  1. +1 for immutable + builder! I like how it works with Uri and UriBuilder especially (although IMHO UriBuilder should have made it easier to interact with query params as NameValueCollection, but I digress). Builders for strings (normal StringBuilder, connection strings, etc) are great, but it’s not as obvious (IMHO) that the real pattern is immutable + builder since strings are already immutable

    Like

  2. @James: It depends on what the expectation is, and whether you ever mutate the array internally. (And whether the element type is mutable.)

    I think it’s good to be “all or nothing” – I don’t like the idea of returning a ReadOnlyCollection which the caller can’t modify but which you’re then going to change underneath them, nor (generally) a read-only collection of elements which can themselves mutate, e.g. StringBuilders. But if you’ve got (say) an int[] which you’re never going to change yourself (and which nothing else has a reference to) then returning a read-only wrapper is a good option, yes.

    Like

  3. Arrays are evil. I generally prefer to use my own ImmutableArray implementation in my personal projects.

    Regarding builders, they can be made to look quite nice if you try making the code look like this:

    CultureInfo info = new CultureInfoBuilder( {
    AMDesignator = “ChangedAM”,
    MonthNames = { { 0, “ChangedJanuary” } }
    };
    CultureInfo info2 = new CultureInfoBuilder(info) {
    AMDesignator = “ChangedAgain”
    };

    It’s not hard to do, and it ends up being even easier to use than a mutable type.

    Like

  4. @configurator: That’s what we do in Protocol Buffers, except using nested types and having an explicit Build() method – I don’t like implicit conversions in general.

    Like

Leave a comment