Category Archives: Noda Time

Versioning conundrum for Noda Time – help requested

March 28, 2016 jonskeet 42 Comments

Obviously I’d normally ask developer questions on Stack Overflow but in this case, it feels like the answers may be at least somewhat opinion-based. If it turns out that it’s sufficiently straightforward that a Stack Overflow question and answer would be useful, I can always repost it there later.

The Facts

Noda Time 1.x exists “in production”, and the latest version is 1.3.1. This targets .NET 3.5 Client profile, .NET 4.0, and PCL Profile 328 (in a directory of lib\portable-net4+sl5+netcore45+wpa81+wp8+MonoAndroid1+MonoTouch1+XamariniOS1)

Noda Time currently includes the IANA time zone data (“TZDB”) – each released version of Noda Time contains the TZDB version that was “most recent” at the time that the Noda Time release was built. This gets out of date quite quickly, as there are multiple releases of TZDB every year. Those releases are named 2016a, 2016b etc. Noda Time also provides the ability to read .nzd files (Noda Zone Data – a custom format) and every time there’s a new release of TZDB, I build a .nzd file and upload it to nodatime.org, updating http://nodatime.org/tzdb/latest.txt to point to the latest version.

Noda Time 2.0 has not been released yet. When I do release it, I expect to target .NET 4.5 and netstandard1.0.

Each Noda Time 1.x release has an AssemblyVersion just based on major/minor, i.e. 1.0, 1.1, 1.2 etc. Based on this blog post, this may have been a mistake – it should quite possibly have been 1.0 for all versions. Obviously I can’t fix that now, but I can make the 2.x releases use 2.0 everywhere.

When 2.0 is “pretty much ready” we’re going to cut a 1.4 release which deprecates things that are removed in 2.0 and provides the new approaches as far as possible. For example, the IClock.Now property from 1.x is removed in 2.0, and replaced by IClock.GetCurrentInstant(). We’ll deprecate the Now property and introduce a GetCurrentInstant() extension method which delegates to it. This shouldn’t break any 1.x users, but should allow them to move over to the new API as far as possible before upgrading to 2.0. The intention is that users wouldn’t stay on 1.4 for very long. (Obviously they could do so, but there’s not a lot of benefit. 1.4 won’t have new features – it’s really just a transition version.)

So far, that’s just the way of the world. Now I want to make it easier for users to stay up-to-date with TZDB – including if nodatime.org goes down. (That’s considerably more likely than nuget.org going down, for example.)

The plan is to introduce a new nearly-data-only assembly, packaged as NodaTime.Tzdb. The aim is to allow users to update their data dependency at build time, in a controlled fashion. If you only want to specify an exact version to depend on, you can do so. If you want to pick up the latest version every time you build, that should be possible too.

The tricky bits come in terms of the versioning.

Some options

Firstly, the versioning scheme for the package ignoring everything else. I plan to use something like this:

2016a => 1.2016.1
2016b => 1.2016.2
2016c => 1.2016.3
…
2017a => 1.2017.1

This should make it reasonably easy to tell the TZDB version just from the package version.

However, I’m considering a few options within this idea:

I could create a single package per TZDB release, targeting .NET 3.5 client profile, .NET 4.0, the Profile 328 PCL, .NET 4.5, and .NET Standard 1.0. The first four of these could depend on Noda Time 1.1, and the last one would have to depend on Noda Time 2.0.
I could do the above, but depend on 1.3.1 instead of 1.1.
I could create one package with two versions per TZDB release – a 1.x depending on Noda Time 1.1, and a 2.x depending on Noda Time 2.0. For example, when TZDB 2016d is released, I could create 1.2016.4 and 2.2016.4.
I could create one package version depending on 1.1, one depending on 1.2, one depending on 1.3, one depending on 1.4 (when that exists) and one depending on 2.0.
I could create two separate packages, i.e. include the Noda Time major version number in the package name. I don’t like this idea, but it’s on the table.

Some concerns and questions

There are various aspects to this which cause me a few worries. I’m not sure how well I can really structure or segregate those, so I’ll just list them.

Can a non-prerelease package depend on a prerelease package for some frameworks? If not, that possibly blows the “single version” idea out of the water, as I can’t depend on NodaTime v2.0 yet – it’s not out.
Even if that’s feasible, is it sane to depend on different major versions of the NodaTime package from within a single version of the NodaTime.Tzdb package, or is that going to cause massive confusion?
Should I depend on NodaTime v1.1 or v1.3.1? They have different AssemblyVersion numbers, which I believe means an assembly binding redirect will be required if I depend on 1.1 but users depend on 1.3.1. To be clear, I don’t expect many users to still be on versions older than 1.3.1.
Likewise, is it going to cause issues for .NET 4.5 users who use NodaTime 2.0 (eventually) if they depend on a version of NodaTime.Tzdb that depends on NodaTime 1.3.1? Again, presumably assembly binding redirects are needed.
If I go with the “two-version” scheme (i.e. 1.2016.4 and 2.2016.4 etc) how careful would NodaTime 1.3.1 users have to be? I wouldn’t want them to accidentally get upgraded to NodaTime 2.0 when that’s released, by accidentally taking the 2.x line of NodaTime.Tzdb.
Does dotnet cli support the nuget allowedVersions feature at all? I haven’t found any support for it in DNX, but really it’s vital for this scheme to work at all – basically I’d expect a NodaTime 1.3.1 user to specify an allowed version range for NodaTime.Tzdb of [1,2)
Is my scheme of 1.2016.4 (etc) sensible? It’s somewhat abusing major/minor/patch, in that there’s no real difference in meaning between a minor version bump (“it’s the new year”) and a patch bump (“there’s been another release in the same year”). Neither kind of change will be breaking (unless you depend on specific time zones behaving in specific ways, of course), and it’s handy to be able to give a simple mapping between TZDB version and package version, but there may be consequences I’m unaware of.

Please feel free to ask clarifying questions in comments. Will look forward to getting some answers :)

General, Noda Time, Stack Overflow

Common mistakes in date/time formatting and parsing

May 5, 2015 jonskeet 14 Comments

There are many, many questions on Stack Overflow about both parsing and formatting date/time values. (I use the term “date/time” to mean pretty much “any type of chronlogical information” – dates, times of day, instants in time etc.) Given how often the same kinds of mistakes are made, I thought it would be handy to have a blog post to refer to.

This post assumes you already know the basic operations of formatting and parsing, in terms of the appropriate types to use:

Java pre-Java-8: SimpleDateFormat (or just DateFormat)
Java in java.time: DateTimeFormatter
Joda Time: DateTimeFormatter (created via DateTimeFormat)
.NET: The Parse/ParseExact/TryParse/TryParseExact/ToString methods on the appropriate type (usually DateTime and DateTimeOffset)
Noda Time: The *Pattern class corresponding to the type you’re working with, e.g. InstantPattern, LocalDateTimePattern.

Pattern woes

There are three broad classes of issue here – one of which is “just” a matter of carelessness, usually, and the other which still surprises me in terms of sheer wrongness.

Pattern capitalization issues

This is an insidious problem, because in some cases you may get the right values, but not all of the time. I suspect it usually comes up again due to copy and paste, but often from specifications rather than other code – in a specification, it’s pretty clear what "YYYY-MM-DD HH:MM:SS" means as a date/time format, but that doesn’t mean it’s the right pattern to put in code.

The main thing to do is read the documentation carefully. Of course, some platforms have clearer documentation than others, but most are at least “good enough”. For the Java APIs, the pattern specifiers are generally documented with the formatting classes themselves; for .NET’s built-in classes you want the custom date and time format strings and standard date and time format strings MSDN pages, and for Noda Time follow the various options from the text handling part of the user guide. (For other platforms, use your common sense. :)

The most common mistakes here are:

Using mm for months or MM for minutes, rather than vice versa. I’ve seen this mistake both ways round.
Using hh for “hour of day” when HH is intended. H is in the range 0-23; h is in the range 1-12. h is usually used singly (rather than requiring exactly two digits), and almost always in conjunction with an AM/PM specifier – as otherwise it’s ambiguous. H is usually used as HH, so that 5am is represented as “05” for example.
Using YYYY for year – in Java and Noda Time, Y is used for week-year rather than normal calendar year; it’s usually used in conjunction with “week of year” and “day of week”, but it’s much less common than yyyy.
Using DD for “day of month” when in Java it actually means “day of year”.

Broad pattern incompatibilities

I’m surprised by how often I see code like this:

var text = "Tue, 5 May 2015 3:15pm";
var dateTime = DateTime.ParseExact(
    text,
    "yyyy-MM-dd'T'HH:mm:ss");

Here the pattern and the actual data are entirely different, and I get the impression that the author has copied the pattern from another piece of code without any thought about what the magic string "yyyy-MM-dd'T'HH:mm:ss" is there for.

I suspect it goes without saying for most readers, but you should never copy code from elsewhere into your own code without understanding how it works, or which parts you may potentially need to modify.

The result of this sort of error is usually a complete failure to parse, which is at least simpler to find than the “plausible but not quite correct” pattern issue.

Pattern incompatibility issues

Some developers assume that a pattern which works in Java will work in Python, or the equivalent for any other pair of platforms. Don’t make this assumption. Always read the documentation – and if you’re porting code from one platform to another, you’ll need to “decode” the pattern with one set of documentation, then “encode” it with the other.

Time zone issues

Understanding time zones

There are two common issues when understanding what a time zone is to start with.

The first is to assume that a UTC offset (e.g. “+8 hours”) is the same as a time zone. This is an understandable mistake, given that a lot of documentation (from organizations which really should know better) misuse the terminology. The UTC offset is the difference between UTC and local time at a particular instant – so for example, while I’m writing this, I’m in the UK time zone which is currently at UTC+1. However, in the winter (in the same time zone) it will be at UTC+0. So if you have a value of (say) “2015-05-10T16:43:00+0100” that only tells you the UTC offset – it doesn’t tell you the time zone. There may well be multiple time zones with the same offset at that particular time, but which will have different offsets at differ times.

The second mistake is to think that an abbreviation such as “EST” or “GMT” identifies a time zone. It doesn’t, in two ways:

A single time zone often uses multiple abbreviations over time. For example, “Pacific Time” varies between PST (Pacific Standard Time) and PDT (Pacific Daylight Time). It’s unfortunate that some people use the abbreviation for standard time even when they mean the general time zone – so even though currently (at the time of writing) Pacific Time is in PDT (UTC-7), some people would write the local time with “PST” at the end. Grr. Avoid abbrevations if you possibly can.
The same abbreviation may be used in multiple time zones, or even at different points in time to mean different things within the same time zone. For example, “BST” can mean British Summer Time in Europe/London (standard time of UTC+0, plus 1 hour of daylight saving time), British Standard Time in Europe/London (standard time of UTC+1, with no daylight saving time, around 1970 only) and Bougainville Standard Time in Pacific/Bougainville (UTC+11). Avoid abbreviations if you possibly can.

Using time zones in text formatting/parsing

First, you need to understand exactly what the library you’re using does with time zones, and what the types you’re using represent. One of the most common misconceptions here is with java.util.Date – this is just an instant in time, with no concept of a time zone or calendar system. The fact that the string returned from Date.toString always uses the system default time zone is unfortunately misleading in this respect, and causes developers to ask how to “convert” a Date from one time zone to another.

Next, you need to understand exactly what your data represents. In my experience, most textual data either specifies a date and/or time without a given time zone or it specifies a date and time with a UTC offset. When no time zone information is present, you may know the time zone it’s meant to refer to, or you may not. If you’re using a library which has multiple different types to represent different kinds of information (e.g. Joda Time, java.time or Noda Time) I personally find it clearest to parse to a type that closest represents the information actually stated in the text, and then convert it to something else where appropriate.

You definitely need to be aware when the parsing operation is going to impose any sort of time zone understanding on your data. This is the case with SimpleDateFormat in Java and with DateTime.ParseExact and friends in .NET. For SimpleDateFormat, unless you explicitly set a time zone (or the pattern includes a UTC offset), the system default time zone is used – this is usually not what you want. Parsing in .NET allows you to specify how you want the text to be understood, but you need to be careful. (The fact that DateTime sometimes represents a value in the system default time zone, sometimes a value in UTC, and sometimes a value with no associated time zone makes this all tricky.)

Locale / culture issues

Most libraries allow you to specify which culture to use when parsing (or formatting) data. This is a two-edged sword:

If you’re formatting a value to be displayed directly to an end user, that’s great: they can see the month name in their own language, etc. In this situation, you’ll typically use a “standard” format (e.g. “the short date/time format”)
If you’re formatting or parsing a value which is designed to be machine-readable (e.g. passed to a web service) then you almost certainly want the invariant culture instead of a user-specific culture. In this situation, you’ll typically use a “custom” format (e.g. “yyyy-MM-dd’T’HH:mm:ss”) or a specific culture-invariant format.

Culture can affect several aspects of handling conversions:

The calendar system used (e.g. the Gregorian calendar vs an Islamic calendar)
The “standard” formats used (e.g. month/day/year vs day/month/year)
The separators used (e.g. - vs / for date separators)
The month and day names used
The number system used

Converting unnecessarily

As a final common problem, you may be performing more conversions than you should be. For example, if you’ve got a DateTime field in the database but you’re passing a value as a string in your SQL parameter (you are using parameterized SQL, right?) then you probably shouldn’t be. Most platforms allow parameters to be specified as the value in a “native” representation. Likewise when you fetch a value, don’t just call toString on it and then parse the result – if the value is a date/time value, it should already be in a native representation; a simple cast (or call to the type-specific method) should be enough.

Conclusion

Date/time text handling is fraught with problems, as a simple look at Stack Overflow shows. Be careful, make sure you know exactly what you’re converting from and to, and check exactly what you’re specifying vs what you’re leaving implicit.

General, Noda Time

Backward compatibility pain

March 2, 2015 jonskeet 24 Comments

I’ve been getting a bit cross about backward compatibility recently. This post contains two examples of backward incompatibilities in .NET 4.6, and one example of broken code which isn’t being fixed due for backward compatibility reasons.

Let me start off by saying this post is not meant to be seen as an attack on Microsoft. I suspect that some people will read that first sentence as “This post is an attack on Microsoft, but I don’t want to say so.” It really isn’t. With my naturally positive disposition, I’m going to assume that the people behind the code and decisions in the examples below are smart people who have more information than I do. Their decisions may prove annoying to me personally, but that doesn’t mean those decisions are bad ones for the world at large.

The purpose of this post is partly just because I think readers will find it interesting, and partly to show how there are different things to consider when it comes to backward compatibility in APIs.

Example 1: Enumerable.OrderByDescending is broken

OrderByDescending is broken when three facts are combined unfortunately:

IComparer.Compare is allowed to return any integer value, including int.MinValue. The return value is effectively meant to represent one of three results:
- the first argument is “earlier than” the second (return a negative integer)
- the two arguments are equal in terms of this comparison (return 0)
- the first argument is “later than” the second (return a positive integer)
-int.MinValue (the unary negation of int.MinValue) is still int.MinValue, because the “natural” result would be outside the range of int. (Think about sbyte as being in the range -128 to 127 inclusive… what would -(-128) in sbyte arithmetic return?)
OrderByDescending uses unary negation to attempt to reverse the order returned by an “ascending” comparer.

I’ve blogged about this before, but for the sake of completeness, here’s an example showing that it’s broken. We use a custom comparer which delegates to a normal string comparer – but only ever returns int.MinValue, 0 or int.MaxValue. Just to reiterate, this is an entirely legitimate comparer.

using System;
using System.Collections.Generic;
using System.Linq;

class OrderByDescendingBug
{
    static void Main()
    {
        var comparer = new MaximalComparer<string>(Comparer<string>.Default);
        string[] input = { "apples", "carrots", "dougnuts", "bananas" };

        var sorted = input.OrderByDescending(x => x, comparer);
        Console.WriteLine(string.Join(", ", sorted));
    }
}

class MaximalComparer<T> : IComparer<T>
{
    private readonly IComparer<T> original;

    public MaximalComparer(IComparer<T> original)
    {
        this.original = original;
    }

    public int Compare(T first, T second)
    {
        int originalResult = original.Compare(first, second);
        return originalResult == 0 ? 0
            : originalResult < 0 ? int.MinValue
            : int.MaxValue;
    }
}

We would like the result of this program to be “doughnuts, carrots, bananas, apples” but on my machine (using .NET 4.6 from VS2015 CTP6) it’s “carrots, dougnuts, apples, bananas”.

Naturally, when I first discovered this bug, I filed it in Connect. Unfortunately, the bug has been marked as closed. This comment was logged in 2011:

Swapping arguments in the call to comparer.Compare as you point out would be the most straightforward and general way to support this. However, while well-behaved implementations of comparer.Compare should handle this fine, there may be some implementations out there with subtle bugs that are not expecting us to reverse the order in which we supply these arguments for a given list. Given our focus on runtime compatibility for the next release, we won’t be able to fix this bug in the next version of Visual Studio, though we’ll definitely keep this in mind for a future release!

Fast, backward compatible, correct – pick any two

The clean solution here – reversing the order of the arguments – isn’t the only way of correcting it. We could use:

return -Math.Sign(original.Compare(x, y));

This still uses unary negation, but now it’s okay, because Math.Sign will only return -1, 0 or 1. It’s very slightly more expensive than the clean solution, of course – there’s the call to Math.Sign and the unary negation. Still, at least it works.

What I object to here is the pandering to incorrect code (implementations of IComparer which don’t obey its contract, by making assumptions about the order in which values will be passed) at the expense of correct code (such as the example above; the use of int.MinValue is forced here, but it can crop up naturally too – in a far harder-to-reproduce way, of course). While I can (unfortunately) believe that there are implementations which really are that broken, I don’t think the rest of us should have to suffer for it. I don’t think we should have to suffer at all, but I’d rather suffer a slight piece of inefficiency (the additional Math.Sign call (which may well be JIT-compiled into a single machine instruction – I haven’t checked) than suffer the current correctness issue.

Example 2: `TimeZoneInfo` becomes smarter in .NET 4.6

A long time ago, Windows time zones had no historical information – they were just a single pair of rules about when the zone started and stopped observing daylight saving time (assuming it did at all).

That improved over time, so that a time zone had a set of adjustment rules, each of which would be in force for a certain portion of history. This made it possible to represent the results of the Energy Policy Act of 2005 for example. These are represented in .NET using TimeZoneInfo.AdjustmentRule, which is slightly under-documented and has suffered from some implementation issues in the past. (There’s also the matter of the data used, but I treat that as a different issue.)

Bugs aside, the properties of TimeZoneInfo and its adjustment rules allowed an interested developer (one wanting to expose the same information in a different form for a better date/time API, as one entirely arbitrary example) to predict the results of the calculations within TimeZoneInfo itself – so the value returned by a call to TimeZoneInfo.GetUtcOffset(DateTime) could be predicted by looking at the standard UTC offset of the time zone, working out which rule was in effect for the specified DateTime, working out if that rule means that DST was being observed at the time, and adjusting the result accordingly.

As of .NET 4.6, it appears this will no longer be the case – not in a straightforward manner. One aspect of inflexibility in TimeZoneInfo is being eliminated: the inability to change standard offset. In the past, if a time zone changed its understanding of “standard time” (as the UK did between 1968 and 1971, for example), that couldn’t be cleanly represented in the TimeZoneInfo data model, leading to some very odd data with “backward” DST offsets to model the situation as nearly as possible.

Now, it seems that each adjustment rule also “knows” the difference between its standard offset and that of the zone as a whole. For the most part, this is a good thing. However, it’s a pain for anyone who works with TimeZoneInfo.AdjustmentRule directly, as the information simply isn’t available on the rule. (This is only a CTP of .NET 4.6, of course – it might become available before the final release.)

Fortunately, one can infer the information by asking the zone itself for the UTC offset of one arbitrary point in the adjustment rule, and then compare that with what you’d predict using just the properties of TimeZoneInfo and AdjustmentRule (taking into account the fact that the start of the rule may already be in DST). So long as the rule performs its internal calculations correctly (and from what I’ve seen, it’s now a lot better than it used to be, though not quite perfect yet) we can predict the result of GetUtcOffset for all other DateTime values.

It’s not clear to me why the information isn’t just exposed with a new property on the rule, however. It’s a change in terms of what’s available, sure – but anyone using the new implementation directly would have to know about the change anyway, as the results of using the exposed data no longer match the results of GetUtcOffset.

Example 3: `PersianCalendar` and leap years

If you thought the previous two examples were obscure, you’ll love this one.

PersianCalendar is changing in .NET 4.6 to use a more complicated leap year formula. The currently documented formula is:

A leap year is a year that, when divided by 33, has a remainder of 1, 5, 9, 13, 17, 22, 26, or 30.

So new PersianCalendar().IsLeapYear(1) has always returned true – until now. It turns out that Windows 10 is going to support the Persian Calendar (also known as the Solar Hijri calendar) in certain locales, and it’s going to do so “properly” – by which I mean, with a more complex leap year computation. This is what’s known as the “astronomical” Persian calendar and it follows the algorithm described in Calendrical Calculations. The BCL implementation is going to be consistent with that Windows calendar, which makes sense.

It’s worth noting that this calendar has the same leap year pattern as the “simple” one for over 320 years around the modern era (Gregorian 1800 to Gregorian 2123) so it’s really only people dealing with long-past dates in the Persian calendar who will notice the difference. Of course, I noticed because I have a unit test checking that my implementation of the Persian calendar in Noda Time is equivalent to the BCL’s implementation. It was fine until I installed Visual Studio 2015 CTP6…

As it happens, there’s another Persian calendar to consider – the “arithmetic” or “algorithmic” Persian calendar proposed by Ahmad Birashk. This consists of three hierarchical cycles of years (either cycles, subcycles and sub-subcycles or cycles, grand cycles and great grand cycles depending on whether you start with the biggest kind and work in, or start at the smallest and work out).

For Noda Time 2.0, I’m now going to support all three forms: simple (for those who’d like compatibility with the “old” BCL implementation), astronomical and arithmetic.

Conclusion

Backwards compatibility is hard. In all of these cases there are reasons for the brokenness, whether that’s just brokenness against correctness as in the first case, or a change in behaviour as in the other two. I think for localization-related code, there’s probably a greater tolerance of change – or there should be, as localization data changes reasonably frequently.

For the second and third cases, I think it’s reasonable to say that compatibility has been broken in a reasonable cause – particular for the second case, where the time zone data can be much more sensible with the additional flexibility of changing the UTC offset of standard time over history. It’s just a shame there’s fall-out.

The changes I would make if I were the only developer in the world would be:

Fix the first case either by ignoring broken comparer implementations, or by taking the hit of calling Math.Sign.
Improve the second case by adding a new property to AdjustmentRule and publicising its existence in large, friendly letters.
Introduce a new class for the third case instead of modifying the behaviour of the existing class. That would certainly be best for me – but for most users, that would probably introduce more problems than it solved. (I suspect that most users of the Persian calendar won’t go outside the “safe” range where the simple and astronomical calendars are the same anyway.)

One of the joys of working on Noda Time 2.0 at the moment is that it’s a new major version and I am willing to break 1.x code… not gratuitously, of course, but where there’s good reason. Even so, there are painful choices to be made in some cases, where there’s a balance between a slight ongoing smell or a clean break that might cause issues for some users if they’re not careful. I can only imagine what the pain is like when maintaining a large and mature codebase like the BCL – or the Windows API itself.

C# 6, Noda Time

C# 6 in action

December 8, 2014 jonskeet 44 Comments

Now that the Visual Studio 2015 Preview is available and the C# 6 feature set is a bit more stable, I figured it was time to start updating the Noda Time 2.0 source code to C# 6. The target framework is still .NET 3.5 (although that might change; I gather very few developers are actually going to be hampered by a change to target 4.0 if that would make things easier) but we can still take advantage of all the goodies C# 6 has in store.

I’ve checked all the changes into a dedicated branch which will only contain changes relevant to C# 6 (although a couple of tiny other changes have snuck in). When I’ve got round to updating my continuous integration server, I’ll merge onto the default branch, but I’m in no rush. (I’ll need to work out what to do about Mono at that point, too – there are various options there.)

In this post, I’ll go through the various C# 6 features, and show how useful (or otherwise) they are in Noda Time.

Read-only automatically implemented properties (“autoprops”)

Finally! I’ve been waiting for these for ages. You can specify a property with just a blank getter, and then assign it a value from either the declaration statement, or within a constructor/static constructor.

So for example, in DateTimeZone, this:

private static readonly DateTimeZone UtcZone = new FixedDateTimeZone(Offset.Zero);
public static DateTimeZone Utc { get { return UtcZone; } }

becomes

public static DateTimeZone Utc { get; } = new FixedDateTimeZone(Offset.Zero);

and

private readonly string id;
public string Id { get { return id; } }
protected DateTimeZone(string id, ...)
{
    this.id = id;
    ...
}

becomes

public string Id { get; }
protected DateTimeZone(string id, ...)
{
    this.Id = id;
    ...
}

As I mentioned before, I’ve been asking for this feature for a very long time – so I’m slightly surprised to find myself not entirely positive about the results. The problem it introduces isn’t really new – it’s just one that I’m not used to, as I haven’t used automatically-implemented properties much in a large code base. The issue is consistency.

With separate fields and properties, if you knew you didn’t need any special behaviour due to the properties when you accessed the value within the same type, you could always use the fields. With automatically-implemented properties, the incidental fact that a field is also exposed as a property changes the code – because now the whole class refers to it as a property instead of as a field.

I’m sure I’ll get used to this – it’s just a matter of time.

Initial values for automatically-implemented properties

The ability to specify an initial value for automatically-implemented properties applies to writable properties as well. I haven’t used that in the main body of Noda Time (almost all Noda Time types are immutable), but here’s an example from PatternTestData, which is used to provide data for text parsing/formatting tests. The code before:

internal CultureInfo Culture { get; set; }

public PatternTestData(...)
{
    ...
    Culture = CultureInfo.InvariantCulture;
}

And after:

internal CultureInfo Culture { get; set; } = CultureInfo.InvariantCulture;

public PatternTestData(...)
{
    ...
}

It’s worth noting that just like with a field initializer, you can’t call instance members within the same class when initializing an automatically-implemented property.

Expression-bodied members

I’d expected to like this feature… but I hadn’t expected to fall in love with it quite as much as I’ve done. It’s really simple to describe – if you’ve got a read-only property, or a method or operator, and the body is just a single return statement (or any other simple statement for a void method), you can use => to express that instead of putting the body in braces. It’s worth noting that this is not a lambda expression, nor is the compiler converting anything to delegates – it’s just a different way of expressing the same thing. Three examples of this from LocalDateTime (one property, one operator, one method – they’re not next to each other in the original source code, but it makes it simpler for this post):

public int Year { get { return date.Year; } }

public static LocalDateTime operator +(LocalDateTime localDateTime, Period period)
{
    return localDateTime.Plus(period);
}

public static LocalDateTime Add(LocalDateTime localDateTime, Period period)
{
    return localDateTime.Plus(period);
}

becomes

public int Year => date.Year;

public static LocalDateTime operator +(LocalDateTime localDateTime, Period period) =>
    localDateTime.Plus(period);

public static LocalDateTime Add(LocalDateTime localDateTime, Period period) =>
    localDateTime.Plus(period);

In my actual code, the operator and method each only take up a single (pretty long) line. For some other methods – particularly ones where the body has a comment – I’ve split it into multiple lines. How you format your code is up to you, of course :)

So what’s the benefit of this? Why do I like it? It makes the code feel more functional. It makes it really clear which methods are just shorthand for some other expression, and which really do involve a series of steps. It’s far too early to claim that this improves the quality of the code or the API, but it definitely feels nice. One interesting data point – using this has removed about half of the return statements across the whole of the NodaTime production assembly. Yup, we’ve got a lot of properties which just delegate to something else – particularly in the core types like LocalDate and LocalTime.

The nameof operator

This was a no-brainer in Noda Time. We have a lot of code like this:

public void Foo([NotNull] string x)
{
    Preconditions.CheckNotNull(x, "x");
    ...
}

This trivially becomes:

public void Foo([NotNull] string x)
{
    Preconditions.CheckNotNull(x, nameof(x));
    ...
}

Checking that every call to Preconditions.CheckNotNull (and CheckArgument etc) uses nameof and that the name is indeed the name of a parameter is going to be one of the first code diagnostics I write in Roslyn, when I finally get round to it. (That will hopefully be very soon – I’m talking about it at CodeMash in a month!)

Dictionary initializers

We’ve been able to use collection initializers with dictionaries since C# 3, using the Add method. C# 6 adds the ability to use the indexer too, which leads to code which looks more like normal dictionary access. As an example, I’ve changed a “field enum value to delegate” dictionary in TzdbStreamData from

private static readonly Dictionary<TzdbStreamFieldId, Action<Builder, TzdbStreamField>> FieldHanders =
    new Dictionary<TzdbStreamFieldId, Action<Builder, TzdbStreamField>>
{
   { TzdbStreamFieldId.StringPool, (builder, field) => builder.HandleStringPoolField(field) },
   { TzdbStreamFieldId.TimeZone, (builder, field) => builder.HandleZoneField(field) },
   { TzdbStreamFieldId.TzdbIdMap, (builder, field) => builder.HandleTzdbIdMapField(field) },
   ...
}

to:

private static readonly Dictionary<TzdbStreamFieldId, Action<Builder, TzdbStreamField>> FieldHanders =
    new Dictionary<TzdbStreamFieldId, Action<Builder, TzdbStreamField>>
{
    [TzdbStreamFieldId.StringPool] = (builder, field) => builder.HandleStringPoolField(field),
    [TzdbStreamFieldId.TimeZone] = (builder, field) => builder.HandleZoneField(field),
    [TzdbStreamFieldId.TzdbIdMap] = (builder, field) => builder.HandleTzdbIdMapField(field),
...
}

One downside of this is that the initializer will now not throw an exception if the same key is specified twice. So whereas the bug in this code is obvious immediately:

var dictionary = new Dictionary<string, string>
{
    { "a", "b" },
    { "a", "c" }
};

if you convert it to similar code using the indexer:

var dictionary = new Dictionary<string, string
{
    ["a"] = "b",
    ["a"] = "c",
};

… you end up with a dictionary which only has a single value.

To be honest, I’m now pretty used to the syntax which uses Add – so even though there are some other dictionaries initialized with collection initializers in Noda Time, I don’t think I’ll be changing them.

Using static members

For a while I didn’t think I was going to use this much – and then I remembered NodaConstants. The majority of the constants here are things like MillisecondsPerHour, and they’re used a lot in some of the core types like Duration. The ability to add a using directive for a whole type, which imports all the members of that type, allows code like this:

public int Seconds =>
    unchecked((int) ((NanosecondOfDay / NodaConstants.NanosecondsPerSecond) % NodaConstants.SecondsPerMinute));

to become:

using NodaTime.NodaConstants;
...
public int Seconds =>
    unchecked((int) ((NanosecondOfDay / NanosecondsPerSecond) % SecondsPerMinute));

Expect to see this to be used a lot in trigonometry code, making all those calls to Math.Cos, Math.Sin etc a lot more readable.

Another benefit of this syntax is to allow extension methods to be imported just from an individual type instead of from a whole namespace. In Noda Time 2.0, I’m introducing a NodaTime.Extensions namespace with extensions to various BCL types (to allow more fluent conversions such as DateTimeOffset.ToOffsetDateTime()) – I suspect that not everyone will want all of these extensions to be available all the time, so the ability to import selectively is very welcome.

String interpolation

We don’t use the system default culture much in Noda Time, which the string interpolation feature always does, so string interpolation isn’t terribly useful – but there are a few cases where it’s handy.

For example, consider this code:

throw new KeyNotFoundException(
    string.Format("No calendar system for ID {0} exists", id));

With C# 6 in the VS2015 preview, this has become

throw new KeyNotFoundException("No calendar system for ID \{id} exists");

Note that the syntax of this feature is not finalized yet – I expect to have to change this for the final release to:

throw new KeyNotFoundException($"No calendar system for ID {id} exists");

It’s always worth considering places where a feature could be used, but probably shouldn’t. ZoneInterval is one such place. Its ToString() feature looks like this:

public override string ToString() =>
    string.Format("{0}: [{1}, {2}) {3} ({4})",
        Name,
        HasStart ? Start.ToString() : "StartOfTime",
        HasEnd ? End.ToString() : "EndOfTime",
        WallOffset,
        Savings);

I tried using string interpolation here, but it ended up being pretty horrible:

String literals within string literals look very odd
An interpolated string literal has to be on a single line, which ended up being very long
The fact that two of the arguments use the conditional operator makes them harder to read as part of interpolation

Basically, I can see string interpolation being great for “simple” interpolation with no significant logic, but less helpful for code like this.

Null propagation

Amazingly, I’ve only found a single place to use null propagation in Noda Time so far. As a lot of the types are value types, we don’t do a lot of null checking – and when we do, it’s typically to throw an exception if the value is null. However, this is the one place I’ve found so far, in BclDateTimeZone.ForSystemDefault. The original code is:

if (currentSystemDefault == null || currentSystemDefault.OriginalZone != local)

With null propagation we can handle “we don’t have a cached system default” and “the cached system default is for the wrong time zone” with a single expression:

if (currentSystemDefault?.OriginalZone != local)

(Note that local will never be null, or this transformation wouldn’t be valid.)

There may well be a few other places this could be used, and I’m certain it’ll be really useful in a lot of other code – it’s just that the Noda Time codebase doesn’t contain much of the kind of code this feature is targeted at.

Conclusion

What a lot of features! C# 6 is definitely a “lots of little features” release rather than the “single big feature” releases we’ve seen with C# 4 (dynamic) and C# 5 (async). Even C# 3 had a lot of little features which all served the common purpose of LINQ. If you had to put a single theme around C# 6, it would probably be making existing code more concise – it’s the sort of feature set that really lends itself to this “refactor the whole codebase to tidy it up” approach.

I’ve been very pleasantly surprised by how much I like expression-bodied members, and read-only automatically implemented properties are a win too (even though I need to get used to it a bit). Other features such as static imports are definitely welcome to remove some of the drudgery of constants and provide finer-grained extension method discovery.

Altogether, I’m really pleased with how C# 6 has come together – I’m looking forward to merging the C# 6 branch into the main Noda Time code base as soon as I’ve got my continuous integration server ready for it…

Noda Time

The mysteries of BCL time zone data

September 30, 2014 jonskeet 10 Comments

Note: I’ve now identified two bugs in TimeZoneInfo… details later in the post.

Background

Early on Friday morning (UTC), IANA released version 2014h of the time zone database. As a dutiful Noda Time maintainer, I fetched it, converted it into our native format, and ran the unit tests prior to pushing the new version.

Unfortunately, some tests failed. These weren’t the tests of the IANA data at all – they were tests of the BCL time zone data, which we access via TimeZoneInfo. I knew our code hadn’t changed, so I temporarily set those tests to be ignored, pushed the update, and filed a bug so I’d remember to fix it. On Friday evening, I tried to work out what had happened – strongly suspecting Windows Update had given us some “interesting” time zone data. Matt Johnson helpfully pointed me to a hotfix which I suspect rolled out via Windows Update, causing me to notice the issue. (As an aside, this is an argument in favour of regular CI builds even when no code has been pushed, if your code uses data which is updated automatically.) Digging into the time zone data for the Russian zones which were updating in the hotfix, I’m very confused.

This post was prompted by Noda Time, but actually it doesn’t involve Noda Time any further. I simply wanted to give some examples of the odd data I’m trying to understand.

Windows adjustment rules

TimeZoneInfo reveals its daylight saving rules via the GetAdjustmentRules() method, which returns an array of TimeZoneInfo.AdjustmentRule. An adjustment rule consists of:

The delta between daylight saving time and the zone’s standard time. (Windows time zones can only have a single standard offset; the data model doesn’t really handle the situation where the standard offset changes over time.)
A range of dates for which the rule is applicable
Transition times for the start and end of daylight savings

The transition times (TimeZoneInfo.TransitionTime) determine at which point within any given year the time zone starts or stops observing daylight saving time. Each time can either have a fixed date (e.g. “March 5th”) or a week-based rule (“The 3rd Sunday in October”). Additionally, it has a time of day associated with it.

So, in order to work out the offset from UTC for a specific moment in time, you should work out which rule it’s part of, and then check whether or not it’s in the daylight saving portion of the rule, right? That certainly sounds reasonable, but it’s slightly trickier in practice.

Awful GUI tooling ahoy!

I got fed up with poking around in the debugger and writing down all the adjustment rules, then testing what it actually did – so I wrote a very small WinForms app to experiment with. The source is in my GitHub repo, and may improve over time. (I’m really not a UX designer. I make sure that a UI does what I need it to, then run away before it breaks. Apologies to anyone who feels offended by the awfulness of this.)

The tool (“TimeZoneInfo explorer”) allows you to select a time zone, at which point the adjustment rules will be displayed. You can then also select two date/time values, and it will show how the UTC offset changes between the two, sampling it once per hour. Here’s an example, showing the UK time zone around the DST transition in October 2014:

UK time zone

Now the UK is particularly simple – especially as the Windows time zone database doesn’t know anything about the British Standard Time experiment, for example. Other time zones are more complicated, and that’s where things get tricky. Let’s look at a few and see why things aren’t as straightforward as they appear.

Awkward cases

Here’s one of the examples which caused my Noda Time failure last Friday: Kaliningrad.

Kaliningrad time zone

There are a few things to note here:

The rules are from the beginning of time to the end of 2010, then for the whole of 2011, then for 2014… it’s not obvious what should happen in 2012 or 2013, or after 2014.
The DST end transition for 2011 is at the start of the year
The DST start transition for 2014 is at the start of the year
The UTC offset changes at midnight UTC at the start of 2014

Now, the documentation for the time of day of a transition is actually fairly clear:

For transitions from standard time to daylight saving time, the TimeOfDay value represents the time of the transition in the time zone’s standard time. For transitions from daylight saving time to standard time, it represents the time of the transition in the time zone’s daylight saving time.

In other words, it’s always the time of day that would have occurred locally if there wasn’t a transition – in IANA time zone language, this is a “wall mode” transition, as it tells you the time you’d see on a wall clock exactly when you need to adjust it.

Great. Now what about the start and end date of a rule? For example, which rule – if any – is in force in Kaliningrad at 2013-12-31T23:00:00Z? The UTC year is 2013, but the local year would be 2014. Does it fall in the gap, or does it use the 2014 rule?

There are three options for interpreting the start/end dates:

They’re in UTC
They’re in standard time (i.e. using the standard UTC offset, and ignoring the possibility that the new year occurs when daylight savings are being observed
They’re in local time

The documentation for AdjustmentRule.DateStart and AdjustmentRule.DateEnd don’t specify how the value should be interpreted, although they do have this warning:

Unless there is a compelling reason to do otherwise, you should define the adjustment rule’s start date to occur within the time interval during which the time zone observes standard time. Unless there is a compelling reason to do so, you should not define the adjustment rule’s start date to occur within the time interval during which the time zone observes daylight saving time. For example, if a time zone’s transition from daylight saving time occurs on the third Sunday of March and its transition to daylight saving time occurs on the first Sunday of October, the effective start date of the adjustment rule should not be January 1 of a particular year, since that date occurs within the period of daylight saving time.

However, I think that every system-provided adjustment rule I’ve seen starts on January 1st and ends on December 31st.

Let’s defer judgement on what this all means until we’ve seen a couple more examples. Next up, Perth. If you enjoy adjusting your clocks, then Windows thinks that Perth is a great place to live, at least at the end of 2008:

Perth time zone

At midnight local time, the offset is adjusted from +9 to +8… and then at 8am local time (as it would have been) it’s adjusted back again, making it 9am.

Finally, here’s a similar example for Tripoli:

Tripoli time zone

This is similar to the Perth case, changing the clocks twice in quick succession – although the transition at the start of 2012 is similar to the Kaliningrad case, occurring at midnight UTC instead of midnight local time. This was actually the first case I noticed, causing issue 220 in Noda Time, and filed as a Connect bug.

So what’s the answer?

~~I haven’t yet come up with a perfect way of understanding the Windows adjustment rules.~~ I’ve now identified two bugs, , thanks to looking at the reference source. Let’s focus on Tripoli, and try to work out some way of explaining the two transitions around the start of 2012, effectively tracing the call to TimeZoneInfo.GetUtcOffset, passing in two different values:

2012-12-31T22:00:00Z
2013-01-01T00:00:00Z.

Just to recap, look again at the final picture above. The first adjustment rule (extended in both directions) would propose the following transitions (amongst many others):

January 1st 2012, midnight local time, +02 -> +03
November 10th 2012, 2am local time, +03 -> +02
January 6th 2013, midnight local time, +02 -> +03

The second adjustment rule would propose the following transitions:

January 3rd 2012, midnight local time, +03 -> +02
March 30th 2012, 1am local time, +02 -> +03
January 1st 2013, midnight local time (2012-12-31T21:00:00Z), +03 -> +02
March 29th 2013, 1am local time, +02 -> +03

Note that at 2012-12-31T22:00:00Z, neither rule would suggest that it’s in DST – and yet the offset is +03. At 2013-01-01T00:00:00Z, it still shouldn’t be in DST, and indeed the offset is +02… but it’s not obvious why there would be a transition at this point.

As noted in a comment, the reference source for TimeZoneInfo holds the key.

First bug: 2012-12-31T22:00:00Z

The code uses the standard offset to determine which adjustment rule to use for the year (so that’s answered that question) and then determines when DST starts and ends in local time. This is where the first bug lies… while the rule to use is identified in GetAdjustmentRuleForTime using the standard time version of the input, the year for which to determine the transitions is the UTC year of the input, due to this statement:

isDaylightSavings = GetIsDaylightSavingsFromUtc(time, year, zone.m_baseUtcOffset, rule, out isAmbiguousLocalDst);

Here time is the UTC version of the original input, and year is assigned a few lines earlier as:

year = time.Year;

So even though we’re using the 2013 rule, we’re finding out what DST transitions it would have used in 2012! Those transitions are:

January 3rd 2012, midnight local time (2012-01-02T21:00:00Z), +03 -> +02
March 30th 2012, 1am local time (2012-03-29T23:00:00Z), +02 -> +03

The transition times are converted from local time into UTC, and then passed to CheckIsDst to determine whether the tuple of (entering DST, time to check, exiting DST) means that the time to check is in DST. The call is effectively:

CheckIsDst(2012-03-29T23:00:00Z, // startTime
           2012-12-31T22:00:00Z, // time
           2012-01-02T21:00:00Z) // endTime

That looks like it’s in DST… so we end up with an offset of +03.

So the bug here is that the rule for 2013 is being asked for its 2012 transitions, despite the first 2013 transition actually coming earlier than the point in time we’re asking about.

Second bug: 2013-01-01T00:00:00Z

Back to the start, and again the code picks the 2013 rule (we’re now two hours into 2013, according to standard time). This time, because the UTC year is also UTC, the code asks the rule for the transitions during 2013. As shown above, these are:

DST ends: January 1st 2013, midnight local time (2012-12-31T21:00:00Z), +03 -> +02
DST starts: March 29th 2013, 1am local time (2012-03-28T23:00:00Z), +02 -> +03

This time our call to CheckIsDst is:

CheckIsDst(2013-03-28T23:00:00Z, // startTime
           2013-01-01T00:00:00Z, // time
           2012-12-31T21:00:00Z) // endTime

So far so good. This looks like we should not be in DST. But now we come to the body of CheckIsDst, which starts off like this:

static private Boolean CheckIsDst(DateTime startTime,
                                  DateTime time,
                                  DateTime endTime) {
    Boolean isDst;
    int startTimeYear = startTime.Year;
    int endTimeYear = endTime.Year;

    if (startTimeYear != endTimeYear) {
        endTime = endTime.AddYears(startTimeYear - endTimeYear);
    }

The years of startTime and endTime aren’t the same, so a year is arbitrarily added to endTime. In other words, the code is effectively assuming that the transition is on the same date every year. It does the same for time as well, if necessary – which it isn’t in our case. So after this change, we have:

startTime = 2013-03-28T23:00:00Z
time = 2013-01-01T00:00:00Z
endTime = 2013-12-31T21:00:00Z

This time we get the right answer – but we’re looking at an “end transition” which isn’t logically predicted by the rule. (The next “UTC end” is in January 2014, because the local date of the change would be Tuesday January 7th.)

The operation here is simply illogical. To my mind, the only justifiable way of determining whether or not a particular time falls in the DST or standard part of a recurrent rule is to predict the transition on/before it and the transition after it.

Conclusion

Of course, none of this reflects what happens in real life – the Windows time zone data is simply inaccurate here, or at best a poor facsimile of a complex situation, limited by the representation available. Still, it would be nice to be able to understand how the BCL is interpreting the data, in order to replicate it in Noda Time.

Now that I understand the two BCL bugs – or at least two of the BCL bugs – I’m minded to suppress the tests and add some warning documentation. This isn’t a matter of “the documentation isn’t terribly clear about what the rules mean” – it’s fundamentally broken behaviour. Ick.

Noda Time

Noda Time v1.0 released

November 7, 2012 jonskeet 13 Comments

Go get Noda Time 1.0!

Today is the end of the longest release cycle I’ve been personally involved in. On November 5th 2009, I announced my intention to write a port of Joda Time for .NET. The next day, Noda Time was born – with a lofty (foolhardy) set of targets.

Near the end of a talk *about* Noda Time this evening, I released Noda Time 1.0.0.

It’s taken three years, but I’m immensely proud of what we’ve managed to achieve. We’re far from "done" but I believe we’re already significantly ahead of most other date/time APIs I’ve seen in terms of providing a clean API which reduces *incidental* complexity while highlighting the *inherent* complexity of the domain. (This is a theme I’m becoming dogmatic about on various fronts.)

There’s more to do – I can’t see myself considering Noda Time to be "done" any time soon – but hopefully now we’ve got a stable release, we can start to build user momentum.

One point I raised at the DotNetDevNet presentation tonight was that there’s a definite benefit (in my very biased view) in just *looking into* Noda Time:

If you can’t use it in your production code, use it when prototyping
If you can’t use it in your prototype code, play with it in personal projects
If you can’t use it in personal projects, read the user guide to understand the concepts

I hope that simply looking at the various types that Noda Time providers will give you more insight into how you should be thinking about date and time handling in your code. While the BCL API has a lot of flaws, you can work around most of them if you make it crystal clear what your data means at every step. The type system will leave that largely ambiguous, but there’s nothing to stop you from naming your variables descriptively, and adding appropriate
comments.

Of course, I would far prefer it if you’d start using Noda Time and raising issues on how to make it better. Spread the word.

Oh, and if anyone from the BCL team is reading this and would like to include something like Noda Time into .NET 5 as a "next generation" date/time, I’d be *really* interested in talking to you :)

C#, Design, Noda Time

The perils of conditional mutability

May 6, 2012 jonskeet 6 Comments

This morning I was wrestling with trying to make some Noda Time unit tests faster. For some reason, the continuous integration host we’re using is really slow at loading resources under .NET 4. The unit tests which run in 10 seconds on my home laptop take over three hours on the continuous integration system. Taking stack traces at regular intervals showed the problem was with the NodaFormatInfo constructor, which reads some resources.

I may look into streamlining the resource access later, but before we get to that point, I wanted to try to reduce the number of times we call that constructor in the first place. NodaFormatInfo is meant to be cached, so I wouldn’t have expected thousands of instances to be created – but it’s only cached when the System.Globalization.CultureInfo it’s based on is read-only. This is where the problems start…

CultureInfo is conditionally mutable (not an official term, just one I’ve coined for the purposes of this post). You can ask whether or not it’s read-only with the IsReadOnly property, and obviously if it’s read-only you can’t change it. Additionally, CultureInfo is composed of other conditionally mutable objects – DateTimeFormatInfo, NumberFormatInfo etc. There’s a static ReadOnly method on CultureInfo to create a read-only wrapper around a mutable CultureInfo. It’s not clearly documented whether that’s expected to take a deep copy (so that callers can really rely on it not changing) or whether it’s expected to reflect any further changes made to the culture info it’s based on. To go in the other direction, you can call Clone on a CultureInfo to create a mutable copy of any existing culture.

Further complications are introduced by the properties on the composite objects – we have properties such as DateTimeFormatInfo.MonthNames which returns a string array. Remember, arrays are always mutable. So it’s really important to know whether the array reference returned from the property refers to a copy of the underlying data, or whether it refers to the array that’s used internally by the type. Obviously for read-only DateTimeFormatInfo objects, I’d expect a copy to be returned – but for a mutable DateTimeFormatInfo, it would potentially make sense to return the underlying array reference. Unfortunately, the documentation doesn’t make this clear – but in practice, it always returns a copy. If you need to change the month names, you need to clone the array, mutate the clone, and then set the MonthNames property.

All of this makes CultureInfo hard to work with. The caching decision earlier on only really works if a "read-only" culture genuinely won’t change behind the scenes. The type system gives you no help to catch subtle bugs at compile-time. Making any of this robust but efficient (in terms of taking minimal copies) is tricky to say the least.

Not only does it make it hard to work with from a client’s point of view, but apparently it’s hard to implement correctly too…

First bug: Mono’s invariant culture isn’t terribly invariant…

(Broken in 2.10.8; apparently fixed later.)

I discovered this while getting Noda Time’s unit tests to pass on Mono. Unfortunately there are some I’ve had to effectively disable at the moment, due to deficiencies in Mono (some of which are being fixed, of course).

Here’s a program which builds a clone of the invariant culture, changes the clone’s genitive month names, and then prints out the first non-genitive name from the plain invariant culture:

using System;
using System.Globalization;

class Test
{
    static void Main()
    {
        CultureInfo clone = (CultureInfo) CultureInfo.InvariantCulture.Clone();
        // Note: not even deliberately changing MonthNames for this culture!
        clone.DateTimeFormat.MonthGenitiveNames[0] = "Changed";

        // Prints Changed
        Console.WriteLine(CultureInfo.InvariantCulture.DateTimeFormat.MonthNames[0]);
    }
}

I believe this bug is really due to the lack of support for genitive month names in Mono at the moment – the MonthGenitiveNames property always just returns a reference to the month names for the invariant culture – without taking a copy first. (It always returns the invariant culture’s month names, even if you’re using a different culture entirely.) The code above shows an "innocent" attempt to change a mutable clone – but in reality we could have used any culture (including an immutable one) to make the change.

Note that in the .NET implementation, the change would only have been to a copy of the underlying data, so even the clone wouldn’t have reflected change anywhere.

Second bug: ReadOnly losing changes

The second bug is the one I found this morning. It looks like it’s fixed in .NET 4, but it’s present in .NET 3.5, which is where it bit me this morning. When you try to make a read-only wrapper around a mutated culture, some of the properties are preserved… but some aren’t:

using System;
using System.Globalization;

class Test
{
    static void Main()
    {
        CultureInfo clone = (CultureInfo) CultureInfo.InvariantCulture.Clone();
        clone.DateTimeFormat.AMDesignator = "ChangedAm";

        // The array is recreated on each call to MonthNames, so changing the
        // value within the array itself doesn’t work :(
        string[] months = (string[]) clone.DateTimeFormat.MonthNames;
        months[0] = "ChangedMonth";
        clone.DateTimeFormat.MonthNames = months;

        CultureInfo readOnlyCopy = CultureInfo.ReadOnly(clone);
        Console.WriteLine(clone.DateTimeFormat.AMDesignator); // ChangedAm
        Console.WriteLine(clone.DateTimeFormat.MonthNames[0]); // ChangedMonth

        Console.WriteLine(readOnlyCopy.DateTimeFormat.AMDesignator); // ChangedAm
        Console.WriteLine(readOnlyCopy.DateTimeFormat.MonthNames[0]); // January (!)
    }
}

I don’t know what’s going on here. In the end I just left the test code using the mutable clone, having added a comment explaining why it wasn’t created a read-only wrapper.

I’ve experimented with a few different options here, including setting the MonthNames property on the clone after creating the wrapper. No joy – I simply can’t make the new month names stick in the read-only copy. <sigh>

Conclusion

I’ve been frustrated by the approach we’ve taken to cultures in Noda Time for a while. I haven’t worked out exactly what we should do about it yet, so it’s likely to stay somewhat annoying for v1, but I may well revisit it significantly for v2. Unfortunately, there’s nothing I can do about CultureInfo itself.

What I would have preferred in all of this is the builder pattern: make CultureInfo, DateTimeFormatInfo etc all immutable, but give each of them mutable builder types, with the ability to create a mutable builder based on an existing immutable instance, and obviously to create a new immutable instance from a builder. That would make all kinds of things simpler – including caching.

For the moment though, I hope we can all learn lessons from this – or have old lessons reinforced, at least:

Making a single type behave in different ways based on different "modes" makes it hard to use correctly. (Yes, this is the same first conclusion as with DateTime in the previous post. Funny, that.)
Immutability has to be deep to be meaningful: it’s not much use having a supposedly read-only object which composes a StringBuilder…
Arrays should be considered somewhat harmful. If you’re going to return an array from a method, make sure you document whether this is a copy of the underlying data, or a "live" reference. (The latter should be very rare, particularly for a public API.) The exception here is if you return an empty array – that’s effectively immutable, so you can always return it with no problems.
The builder pattern rocks – use it!

In my next post I’ll try to get back to the TimeZoneInfo oddities I mentioned last time.

C#, Evil code, Noda Time

More fun with DateTime

May 2, 2012 jonskeet 20 Comments

(Note that this is deliberately not posted in the Noda Time blog. I reckon it’s of wider interest from a design perspective, and I won’t be posting any of the equivalent Noda Time code. I’ll just say now that we don’t have this sort of craziness in Noda Time, and leave it at that…)

A few weeks ago, I was answering a Stack Overflow question when I noticed an operation around dates and times which should have been losing information apparently not doing so. I investigated further, and discovered some "interesting" aspects of both DateTime and TimeZoneInfo. In an effort to keep this post down to a readable length (at least for most readers; certain WebDriver developers who shall remain nameless have probably given up by now already) I’ll save the TimeZoneInfo bits for another post.

Background: daylight saving transitions and ambiguous times

There’s one piece of inherent date/time complexity you’ll need to understand for this post to make sense: sometimes, a local date/time occurs twice. For the purposes of this post, I’m going to assume you’re in the UK time zone. On October 28th 2012, at 2am local time (1am UTC), UK clocks will go back to 1am local time. So 1:20am local time occurs twice – once at 12:20am UTC (in daylight saving time, BST), and once at 1:20am UTC (in standard time, GMT).

If you want to run any of the code in this post and you’re not in the UK, please adjust the dates and times used to a similar ambiguity for when your clocks go back. If you happen to be in a time zone which doesn’t observe daylight savings, I’m afraid you’ll have to adjust your system time zone in order to see the effect for yourself.

DateTime.Kind and conversions

As you may already know, as of .NET 2.0, DateTime has a Kind property, of type DateTimeKind – an enum with the following values:

Local: The DateTime is considered to be in the system time zone. Not an arbitrary "local time in some time zone", but in the specific current system time zone.
Utc: The DateTime is considered to be in UTC (corollary: it always unambiguously represents an instant in time)
Unspecified: This means different things in different contexts, but it’s a sort of "don’t know" kind; this is closer to "local time in some time zone" which is represented as LocalDateTime in Noda Time.

DateTime provides three methods to convert between the kinds:

ToUniversalTime: if the original kind is Local or Unspecified, convert it from local time to universal time in the system time zone. If the original kind is Utc, this is a no-op.
ToLocalTime: if the original kind is Utc or Unspecified, convert it from UTC to local time. If the original kind is Local, this is a no-op.
SpecifyKind: keep the existing date/time, but just change the kind. (So 7am stays as 7am, but it changes the meaning of that 7am effectively.)

(Prior to .NET 2.0, ToUniversalTime and ToLocalTime were already present, but always assumed the original value needed conversion – so if you called x.ToLocalTime().ToLocalTime().ToLocalTime() the result would probably end up with the appropriate offset from UTC being applied three times!)

Of course, none of these methods change the existing value – DateTime is immutable, and a value type – instead, they return a new value.

DateTime’s Deep Dark Secret

(The code in this section is presented in several chunks, but it forms a single complete piece of code – later chunks refer to variables in earlier chunks. Put it all together in a Main method to run it.)

Armed with the information in the previous sections, we should be able to make DateTime lose data. If we start with 12:20am UTC and 1:20am UTC on October 28th as DateTimes with a kind of Utc, when we convert them to local time (on a system in the UK time zone) we should get 1:20am in both cases due to the daylight saving transition. Indeed, that works:

// Start with different UTC values around a DST transition
var original1 = new DateTime(2012, 10, 28, 0, 20, 0, DateTimeKind.Utc);
var original2 = new DateTime(2012, 10, 28, 1, 20, 0, DateTimeKind.Utc);

// Convert to local time
var local1 = original1.ToLocalTime();
var local2 = original2.ToLocalTime();

// Result is the same for both values. Information loss?
var expected = new DateTime(2012, 10, 28, 1, 20, 0, DateTimeKind.Local);
Console.WriteLine(local1 == expected); // True
Console.WriteLine(local2 == expected); // True
Console.WriteLine(local1 == local2); // True

If we’ve started with two different values, applied the same operation to both, and ended up with equal values, then we must have lost information, right? That doesn’t mean that operation is "bad" any more than "dividing by 2" is bad. You ought to be aware of that information loss, that’s all.

So, we ought to be able to demonstrate that information loss further by converting back from local time to universal time. Here we have the opposite problem: from our local time of 1:20am, we have two valid universal times we could convert to – either 12:20am UTC or 1:20am UTC. Both answers would be correct – they are universal times at which the local time would be 1:20am. So which one will get picked? Well… here’s the surprising bit:

// Convert back to UTC
var roundTrip1 = local1.ToUniversalTime();
var roundTrip2 = local2.ToUniversalTime();

// Values round-trip correctly! Information has been recovered…
Console.WriteLine(roundTrip1 == original1); // True
Console.WriteLine(roundTrip2 == original2); // True
Console.WriteLine(roundTrip1 == roundTrip2); // False

Somehow, each of the local values knows which universal value it came from. The The information has been recovered, so the reverse conversion round-trips each value back to its original one. How is that possible?

It turns out that DateTime actually has four potential kinds: Local, Utc, Unspecified, and "local but treat it as the earlier option when resolving ambiguity". A DateTime is really just a 64-bit number of ticks, but because the range of DateTime is only January 1st 0001 to December 31st 9999. That range can be represented in 62 bits, leaving 2 bits "spare" to represent the kind. 2 bits gives 4 possible values… the three documented ones and the shadowy extra one.

Through experimentation, I’ve discovered that the kind is preserved if you perform arithmetic on the value, too… so if you go to another "fall back" DST transition such as October 30th 2011, the ambiguity resolution works the same way as before:

var local3 = local1.AddYears(-1).AddDays(2);
var local4 = local2.AddYears(-1).AddDays(2);
Console.WriteLine(local3.ToUniversalTime().Hour); // 0
Console.WriteLine(local4.ToUniversalTime().Hour); // 1

If you use DateTime.SpecifyKind with DateTimeKind.Local, however, it goes back to the "normal" kind, even though it looks like it should be a no-op:

// Should be a no-op?
var local5 = DateTime.SpecifyKind(local1, local1.Kind);
Console.WriteLine(local5.ToUniversalTime().Hour); // 1

Is this correct behaviour? Or should it be a no-op, just like calling ToLocalTime on a "local" DateTime is? (Yes, I’ve checked – that doesn’t lose the information.) It’s hard to say, really, as this whole business appears to be undocumented… at least, I haven’t seen anything in MSDN about it. (Please add a link in the comments if you find something. The behaviour actually goes against what’s documented, as far as I can tell.)

I haven’t looked into whether various forms of serialization preserve values like this faithfully, by the way – but you’d have to work hard to reproduce it in non-framework code. You can’t explicitly construct a DateTime with the "extra" kind; the only ways I know of to create such a value are via a conversion to local time or through arithmetic on a value which already has the kind. (Admittedly if you’re serializing a DateTime with a Kind of Local, you’re already on potentially shaky ground, given that you could be deserializing it on a machine with a different system time zone.)

Unkind comparisons

I’ve misled you a little, I have to admit. In the code above, when I compared the "expected" value with the results of the first conversions, I deliberately specified DateTimeKind.Local in the constructor call. After all, that’s the kind we do expect. Well, yes – but I then printed the result of comparing this value with local1 and local2… and those comparisons would have been the same regardless of the kind I’d specified in the constructor.

All comparisons between DateTimes ignore the Kind property. It’s not just restricted to equality. So for example, consider this comparison:

// In June: Local time is UTC+1, so 8am UTC is 9am local
var dt1 = new DateTime(2012, 6, 1, 8, 0, 0, DateTimeKind.Utc);
var dt2 = new DateTime(2012, 6, 1, 8, 30, 0, DateTimeKind.Local);
Console.WriteLine(dt1 < dt2); // True

When viewed in terms of "what instants in time do these both represent?" the answer here is wrong – when you convert both values into the same time zone (in either direction), dt1 occurs after dt2. But a simple look at the properties tells a different story. In practice, I suspect that most comparisons between DateTime values of different kinds involve code which is at best sloppy and is quite possibly broken in a meaningful way.

Of course, if you bring Kind=Unspecified into the picture, it becomes impossible to compare meaningfully in a kind-sensitive way. Is 12am UTC before or after 1am Unspecified? It depends what time zone you later use.

To be clear, it is a hard-to-resolve issue, and one that we don’t do terribly well at in Noda Time at the moment for ZonedDateTime. (And even with just LocalDateTime you’ve got issues between calendars.) This is a situation where providing separate Comparer<T> implementations works nicely – so you can explicitly say what kind of comparison you want.

Conclusions

There’s more fun to be had with a similar situation when we look at TimeZoneInfo, but for now, a few lessons:

Giving a type different "modes" which make it mean fairly significantly different things is likely to cause headaches
Keeping one of those modes secret (and preventing users from even constructing a value in that mode directly) leads to even more fun and games
If two instances of your type are considered "equal" but behave differently, you should at least consider whether there’s something smelly going on
There’s always more fun to be had with DateTime…

C#, Evil code, Noda Time

Type initializer circular dependencies

April 7, 2012 jonskeet 28 Comments

To some readers, the title of this post may induce nightmarish recollections of late-night debugging sessions. To others it may be simply the epitome of jargon. Just to break the jargon down a bit:

Type initializer: the code executed to initialize the static variables of a class, and the static constructor
Circular dependency: two bits of code which depend on each other – in this case, two classes whose type initializers each require that the other class is initialized

A quick example of the kind of problem I’m talking about would be helpful here. What would you expect this code to print?

using System;

class Test
{
    static void Main()
    {
        Console.WriteLine(First.Beta);
    }
}

class First
{
public static readonly int Alpha = 5;
public static readonly int Beta = Second.Gamma;
}

class Second
{
public static readonly int Gamma = First.Alpha;
}

Of course, without even glancing at the specification, any expectations are pretty irrelevant. Here’s what the spec (section 10.5.5.1 of the C# 4 version):

The static field variable initializers of a class correspond to a sequence of assignments that are executed in the textual order in which they appear in the class declaration. If a static constructor (§10.12) exists in the class, execution of the static field initializers occurs immediately prior to executing that static constructor. Otherwise, the static field initializers are executed at an implementation-dependent time prior to the first use of a static field of that class.

In addition to the language specification, the CLI specification gives more details about type initialization in the face of circular dependencies and multiple threads. I won’t post the details here, but the gist of it is:

Type initialization acts like a lock, to prevent more than one thread from initializing a type
If the CLI notices that type A needs to be initialized in order to make progress, but it’s already in the process of initializing type A in the same thread, it continues as if the type were already initialized.

So here’s what you might expect to happen:

Initialize Test: no further action required
Start running Main
Start initializing First (as we need First.Beta)
Set First.Alpha to 5
Start initializing Second (as we need Second.Gamma)
Set Second.Gamma to First.Alpha (5)
End initializing Second
Set First.Beta to Second.Gamma (5)
End initializing First
Print 5

Here’s what actually happens – on my box, running .NET 4.5 beta. (I know that type initialization changed for .NET 4, for example. I don’t know of any changes for .NET 4.5, but I’m not going to claim it’s impossible.)

Initialize Test: no further action required
Start running Main
Start initializing First (as we need First.Beta)
Start initializing Second (we will need Second.Gamma)
Set Second.Gamma to First.Alpha (0)
End initializing Second
Set First.Alpha to 5
Set First.Beta to Second.Gamma (0)
End initializing First
Print 0

Step 5 is the interesting one here. We know that we need First to be initialized, in order to get First.Alpha, but this thread is already initializing First (we started in step 3) so we just access First.Alpha and hope that it’s got the right value. As it happens, that variable initializer hasn’t been executed yet. Oops.

(One subtlety here is that I could have declared all these variables as constants instead using "const" which would have avoided all these problems.)

Back in the real world…

Hopefully that example makes it clear why circular dependencies in type initializers are nasty. They’re hard to spot, hard to debug, and hard to test. Pretty much your classic Heisenbug, really. It’s important to note that if the program above had happened to initialize Second first (to access a different variable, for example) we could have ended up with a different result. In particular, it’s easy to get into a situation where running all your unit tests can cause a failure – but if you run just the failing test, it passes.

One way of avoiding all of this is never to use any type initializers for anything, of course. In many cases that’s exactly the right solution – but often there are natural uses, particularly for well-known values such as Encoding.Utf8, TimeZoneInfo.Utc and the like. Note that in both of those cases they are static properties, but I would expect them to be backed by static fields. I’m somewhat ambivalent between using public static readonly fields and public static get-only properties – but as we’ll see later, there’s a definite advantage to using properties.

Noda Time has quite a few values like this – partly because so many of its types are immutable. It makes sense to create a single UTC time zone, a single ISO calendar system, a single "pattern" (text formatter/parser) for each of a variety of common cases. In addition to the publicly visible values, there are various static variables used internally, mostly for caching purposes. All of this definitely adds complexity – and makes it harder to test – but the performance benefits can be significant.

Unfortunately, a lot of these values end up with fairly natural circular dependencies – as I discovered just recently, where adding a new static field caused all kinds of breakage. I was able to fix the immediate cause, but it left me concerned about the integrity of the code. I’d fixed the one failure I knew about – but what about any others?

Testing type initialization

One of the biggest issues with type initialization is the order-sensitivity – combined with the way that once a type has been initialized once, that’s it for that AppDomain. As I showed earlier, it’s possible that initializing types in one particular order causes a problem, but a different order won’t.

I’ve decided that for Noda Time at least, I want to be reasonably sure that type initialization circularity isn’t going to bite me. So I want to validate that no type initializers form cycles, whatever order the types are initialized in. Logically if we can detect a cycle starting with one type, we ought to be able to detect it starting with any of the other types in that cycle – but I’m sufficiently concerned about weird corner cases that I’d rather just take a brute force approach.

So, as a rough plan:

Start with an empty set of dependencies
For each type in the target assembly:
- Create a new AppDomain
- Load the target assembly into it
- Force the type to be initialized
- Take a stack trace at the start of each type initializer and record any dependencies
Look for cycles in the complete set of dependencies

Note that we’ll never spot a cycle within any single AppDomain, due to the way that type initialization works. We have to put together the results for multiple initialization sequences to try to find a cycle.

A description of the code would probably be harder to follow than the code itself, but the code is relatively long – I’ve included it at the end of this post to avoid intefering with the narrative flow. For more up-to-date versions in the future, look at the Noda Time repository.

This isn’t a terribly nice solution, for various reasons:

Creating a new AppDomain and loading assemblies into it from a unit test runner isn’t as simple as it might be. My code doesn’t currently work with NCrunch; I don’t know how it finds its assemblies yet. When I’ve fixed that, I’m sure other test runners would still be broken. Likewise I’ve had to explicitly filter types to get TeamCity (the continuous integration system Noda Time uses) to work properly. Currently, you’d need to edit the test code to change the filters. (It’s possible that there are better ways of filtering, of course.)
It relies on each type within the production code which has an "interesting" type initializer to have a line like this:
private static readonly int TypeInitializationChecking = NodaTime.Utility.TypeInitializationChecker.RecordInitializationStart();
Not only does the previous line need to be added to the production code – it clearly gets executed each time, and takes up heap space per type. It’s only 4 bytes for each type involved, and it does no real work when we’re not testing, but it’s a nuisance anyway. I could use preprocessor directives to remove the code from non-debug or non-test-targeted builds, but that would look even messier.
It only picks up cycles which occur when running the version of .NET the tests happen to execute on. Given that there are ordering changes for different versions, I wouldn’t like to claim this is 100% bullet-proof. Likewise if there are only cycles when you’re running in some specific cultures (or other environmental features), it won’t necessarily pick those up.
I’ve deliberately not tried to make the testing code thread-safe. That’s fine in Noda Time – I don’t have any asynchronous operations or new threads in Noda Time at all – but other code may need to make this more robust.

So with all these caveats, is it still worth it? Absolutely: it’s already found bugs which have now been fixed.

In fact, the test didn’t get as far as reporting cycles to start with – it turned out that if you initialized one particular type first, the type initializer would fail with a NullReferenceException. Ouch! Once I’d fixed that, there were still quite a few problems to fix. Somewhat coincidentally, fixing them improved the design too – although the user-visible API didn’t change at all.

Fixing type initializer cycles

In the past, I’ve occasionally "fixed" type initialization ordering problems by simply moving fields around. The cycles still existed, but I figured out how to make them harmless. I can say now that this approach does not scale, and is more effort than it’s worth. The code ends up being brittle, hard to think about, and once you’ve got more than a couple of types involved it’s really error-prone, at least for my brain. It’s much better to break the cycle completely. To this end, I’ve ended up using a fairly simple technique to defer initialization of static variables. It’s a poor-man’s Lazy<T>, to some extent – but I’d rather not have to write Lazy<T> myself, and we’re currently targeting .NET 3.5…

Basically, instead of exposing a public static readonly field which creates the cycle, you expose a public static readonly property – which returns an internal static readonly field in a nested, private static class. We still get the nice thread-safe once-only initialization of a type initializer, but the nested type won’t be initialized until it needs to be. (In theory it could be initialized earlier, but a static constructor would ensure it isn’t.) So long as nothing within the rest of the type initializer for the containing class uses that property, we avoid the cycle.

So instead of this:

// Requires Bar to be initialized – if Bar also requires Foo to be
// initialized, we have a problem…
public static readonly Foo SimpleFoo = new Foo(Bar.Zero);

We might have:

public static readonly Foo SimpleFoo { get { return Constants.SimpleFoo; } }

private static class Constants
{
private static readonly int TypeInitializationChecking = NodaTime.Utility.TypeInitializationChecker.RecordInitializationStart();

    // This requires both Foo and Bar to be initialized, but that’s okay
    // so long as neither of them require Foo.Constants to be initialized.
    // (The unit test would spot that.)
    internal static readonly Foo SimpleFoo = new Foo(Bar.Zero);
}

I’m currently undecided about whether to include static constructors in these classes to ensure lazy initialization. If the type initializer for Foo triggered the initializer of Foo.Constants, we’d be back to square one… but adding static constructors into each of these nested classes sounds like a bit of a pain. The nested classes should call into the type initialization checking as well, to validate they don’t cause any problems themselves.

Conclusion

I have to say, part of me really doesn’t like either the testing code or the workaround. Both smack of being clever, which is never a good thing. It’s definitely worth considering whether you could actually just get rid of the type initializer (or part of it) entirely, avoiding maintaining so much static state. It would be nice to be able to detect these type initializer cycles without running anything, simply using static analysis – I’m going to see whether NDepend could do that when I get a chance. The workaround doesn’t feel as neat as Lazy<T>, which is really what’s called for here – but I don’t trust myself to implement it correctly and efficiently myself.

So while both are somewhat hacky, they’re better than the alternative: buggy code. That’s what I’m ashamed to say I had in Noda Time, and I don’t think I’d ever have spotted all the cycles by inspection. It’s worth a try on your own code – see whether you’ve got problems lurking…

Appendix: Testing code

As promised earlier, here’s the code for the production and test classes.

TypeInitializationChecker

This is in NodaTime.dll itself.

internal sealed class TypeInitializationChecker : MarshalByRefObject
{
private static List<Dependency> dependencies = null;

private static readonly MethodInfo EntryMethod = typeof(TypeInitializationChecker).GetMethod("FindDependencies");

    internal static int RecordInitializationStart()
    {
        if (dependencies == null)
        {
            return 0;
        }
        Type previousType = null;
        foreach (var frame in new StackTrace().GetFrames())
        {
            var method = frame.GetMethod();
            if (method == EntryMethod)
            {
                break;
            }
            var declaringType = method.DeclaringType;
            if (method == declaringType.TypeInitializer)
            {
                if (previousType != null)
                {
                    dependencies.Add(new Dependency(declaringType, previousType));
                }
                previousType = declaringType;
            }
        }
        return 0;
    }

    /// <summary>
    /// Invoked from the unit tests, this finds the dependency chain for a single type
    /// by invoking its type initializer.
    /// </summary>
    public Dependency[] FindDependencies(string name)
    {
        dependencies = new List<Dependency>();
        Type type = typeof(TypeInitializationChecker).Assembly.GetType(name, true);
        RuntimeHelpers.RunClassConstructor(type.TypeHandle);
        return dependencies.ToArray();
    }

    /// <summary>
    /// A simple from/to tuple, which can be marshaled across AppDomains.
    /// </summary>
    internal sealed class Dependency : MarshalByRefObject
    {
        public string From { get; private set; }
        public string To { get; private set; }
        internal Dependency(Type from, Type to)
        {
            From = from.FullName;
            To = to.FullName;
        }
    }
}

TypeInitializationTest

This is within NodaTime.Test:

[TestFixture]
public class TypeInitializationTest
{
    [Test]
    public void BuildInitializerLoops()
    {
        Assembly assembly = typeof(TypeInitializationChecker).Assembly;
        var dependencies = new List<TypeInitializationChecker.Dependency>();
        // Test each type in a new AppDomain – we want to see what happens where each type is initialized first.
        // Note: Namespace prefix check is present to get this to survive in test runners which
        // inject extra types. (Seen with JetBrains.Profiler.Core.Instrumentation.DataOnStack.)
        foreach (var type in assembly.GetTypes().Where(t => t.FullName.StartsWith("NodaTime")))
        {
            // Note: this won’t be enough to load the assembly in all test runners. In particular, it fails in
            // NCrunch at the moment.
            AppDomainSetup setup = new AppDomainSetup { ApplicationBase = AppDomain.CurrentDomain.BaseDirectory };
            AppDomain domain = AppDomain.CreateDomain("InitializationTest" + type.Name, AppDomain.CurrentDomain.Evidence, setup);
            var helper = (TypeInitializationChecker)domain.CreateInstanceAndUnwrap(assembly.FullName,
                typeof(TypeInitializationChecker).FullName);
            dependencies.AddRange(helper.FindDependencies(type.FullName));
        }
        var lookup = dependencies.ToLookup(d => d.From, d => d.To);
        // This is less efficient than it might be, but I’m aiming for simplicity: starting at each type
        // which has a dependency, can we make a cycle?
        // See Tarjan’s Algorithm in Wikipedia for ways this could be made more efficient.
        // http://en.wikipedia.org/wiki/Tarjan’s_strongly_connected_components_algorithm
        foreach (var group in lookup)
        {
            Stack<string> path = new Stack<string>();
            CheckForCycles(group.Key, path, lookup);
        }
    }

    private static void CheckForCycles(string next, Stack<string> path, ILookup<string, string> dependencyLookup)
    {
        if (path.Contains(next))
        {
            Assert.Fail("Type initializer cycle: {0}-{1}", string.Join("-", path.Reverse().ToArray()), next);
        }
        path.Push(next);
        foreach (var candidate in dependencyLookup[next].Distinct())
        {
            CheckForCycles(candidate, path, dependencyLookup);
        }
        path.Pop();
    }
}

C#, Design, Noda Time

Subtleties in API design – member placement

February 29, 2012 jonskeet 38 Comments

Noda Time is nearing v1.0, which means I’m spending more time writing documentation than code. It also means reviewing the APIs we’ve got with a critical eye – whether that’s removing extraneous members, adding useful ones, or moving things around. (In particular, writing documentation often suggests where a change would make calling code read more naturally.)

This post is about one particular section of the API, and the choices available. Although I do go into some detail around the specific calls involved, that’s just for context… the underlying choices are ones which could be faced when designing any API. I’ve rarely spent as much time thinking about API decisions as I have with Noda Time, so hopefully this will prove interesting to you even if you really don’t care about Noda Time itself as a project.

Introduction: time zones, local date/times and zoned date/times – oh my!

(Okay, so that’s not quite as snappy as the Judy Garland version, but hey…)

The area of API we’re going to focus on is time zones, and converting between "local" date/time values and "zoned" ones. The three types involved are:

LocalDateTime: a "local" date and time, with no specific time zone. So, something like "7:30 in the evening on February 27th 2012". This means different instants in time in different time zones, of course: if you’re arranging a meeting, it’s good enough when the attendees are in the same time zone, but not good enough if you’re meeting with someone on the other side of the world. (A LocalDateTime also has an associated calendar system, but I’m not going to talk about that much in this post.)
DateTimeZone: a time zone. At its core, this maps any "instant" in time to an offset – the difference between UTC and local time in that time zone. The offset changes over time, typically (but not always) due to daylight saving changes.
ZonedDateTime: a date and time in a particular time zone, with an offset from UTC to avoid ambiguity in some cases (and for efficiency). This identifies a specific instant in time (simply by subtracting the offset from the local date/time). Conceptually this is equivalent to just maintaining the "instant" value, the time zone, and the calendar system – but it’s generally cleaner to think of it as a "local" value with an offset from UTC.

If those brief descriptions don’t make sense for you at the moment (this sort of thing is quite hard to describe concisely and precisely) you may want to see whether the Noda Time user guide "concepts" page helps.

The API task: mapping from { LocalDateTime, DateTimeZone } to ZonedDateTime

It’s easy to get from a ZonedDateTime to a LocalDateTime – you can just use the LocalDateTime property. The difficult bit is the other way round. We obviously want to be able to create a ZonedDateTime from the combination of a LocalDateTime and a DateTimeZone, but the question is where to put this functionality. Three options suggest themselves:

A static method (or constructor) in ZonedDateTime which takes both the time zone and the local date/time as arguments
An instance method on LocalDateTime which just takes the time zone as an argument
An instance method on DateTimeZone which just takes the local date/time as an argument

It gets more complicated though – we’re not really talking about one operation here, but potentially several. Although the mapping from instant to offset is unambiguous in DateTimeZone, the mapping from LocalDateTime to offset is not straightforward. There can be 0, 1 or 2 possible results. For example, in the America/Los_Angeles time zone the clocks go forward from 2am to 3am on Sunday March 11th 2012, and go back from 2am to 1am on Sunday 4th November 2012. That means:

The mapping from local date/time to offset at 7.30pm on February 27th 2012 is unambiguous: it’s definitely -8 hours (L.A. is 8 hours behind UTC).
The mapping at 2.30am on March 11th 2012 is impossible: at 2am the clocks were put forward to 3am, so 2.30am simply never occurs.
The mapping at 2.30am on November 4th 2012 is ambiguous: it happens once before the clocks go back at 3am, and once afterwards. The offset is either -7 or -8 hours, depending on which occurrence you mean.

When mapping a local time to "global" time, this is something you should really think about. Most APIs obscure the issue, but one of the purposes of Noda Time is to force developers to think about issues which they should be aware of. This one is particularly insidious in that it’s the kind of problem which is much more likely to arise when you’re asleep than during daylight hours – so it’s unlikely to be found during manual testing. (Ditto the day of week – most time zones have daylight saving transitions early on a Sunday morning.)

So, Noda Time exposes four ways of mapping a LocalDateTime and DateTimeZone to a ZonedDateTime:

Exact: if there’s a single mapping, return it. Otherwise, throw an exception.
Earlier: if there’s a single mapping, return it. If there’s more than one, return the earlier one. If the time is skipped, throw an exception.
Later: if there’s a single mapping, return it. If there’s more than one, return the later one. If the time is skipped, throw an exception.
All information: find out all the information relevant to mapping the given local value – how many matches there are, what they would be, what the time zone information is for each mapping, etc. The caller can then do what they want.

Options available

The question is how we expose these operations. Let’s look at some options, then discuss the pros and cons.

Option 1: methods on LocalDateTime

A lot of Noda Time is designed to be "fluent" so it makes a certain amount of sense to be able to take a LocalDateTime, perform some arithmetic on it, then convert it to a ZonedDateTime, then (say) format it. So we could have something like:

var zoned = local.InZone(zone); // implicitly exact
var zoned = local.InZoneOrEarlier(zone);
var zoned = local.InZoneOrLater(zone);
var mapping = local.MapToZone(zone);

Option 2: methods on DateTimeZone

All the calculations involved are really about the time zone – the local date/time value is just a simple value as far as most of this is concerned. So we can put the methods on DateTimeZone instead:

var zoned = zone.AtExactly(local);
var zoned = zone.AtEarlier(local);
var zoned = zone.AtLater(local);
var mapping = zone.MapLocalDateTime(local);

Option 3: methods (or constructors) on ZonedDateTime

Maybe we consider the two inputs to be fairly equal, but the result type is more important:

var zoned = ZonedDateTime.FromLocal(zone, local);
var zoned = ZonedDateTime.FromLocalOrEarlier(zone, local);
var zoned = ZonedDateTime.FromLocalOrLater(zone, local);
var mapping = ZoneLocalMapping.FromLocal(local)

(I’m not terribly happy about the names here; there could be better ones of course.)

Variation a: any of the above options, but with an enum for ambiguity resolution

We don’t really need four methods on any of these APIs; the first three only differ by how they handle ambiguity (the situation where a particular local date/time occurs twice). We could use an enum to represent that choice instead:

var zoned = local.InZone(zone, ZoneAmbiguityResolver.Error);
var zoned = local.InZone(zone, ZoneAmbiguityResolver.Earlier);
var zoned = local.InZone(zone, ZoneAmbiguityResolver.Later);

(Or a "smart enum" with behaviour, if we wanted. A normal class type with methods etc, but only a restricted set of valid values.)

Variation b: always go via the mapping result

Given that we already have the idea of getting the full mapping results, we can reduce the API to just one method to return the mapping information, and then put extra methods on that:

var zoned = local.MapInZone(zone).SingleMatch;
var zoned = local.MapInZone(zone).SingleOrEarlier;
var zoned = local.MapInZone(zone).SingleOrLater;

(Again, the names aren’t fixed in stone, and the second part could be methods instead of properties if we wanted.)

Variation c: return a sequence of results

If we return a sequence with 0, 1 or 2 ZonedDateTime values, the user can just use LINQ to get the one they want. Again, this can apply wherever we decide to put the method:

var zoned = zone.At(local).Single();
var zoned = zone.At(local).First();
var zoned = zone.At(local).Last();

So, it looks like we effectively have two mostly-orthogonal decisions here:

Where to "start" the conversion – the target type for the method call
How to represent the multiple options

We’ll consider them separately.

Regarding the "source" type

To start with, I’ll reveal my bias: the existing implementation is option 2 (four methods on DateTimeZone). This was after a small amount of debate on the Noda Time mailing list, and this was the most relevant bit of the discussion:

Me (before going with the current implementation):

It feels a little odd to me to use the zone as the principal class here – just in terms of usability. It makes total sense in terms of the logic, but I tend to think of having a LocalDateTime first, and then converting that to use a particular zone – it’s not an operation which feels like it acts on the zone itself.

David N:

I actually feel the opposite: asking a DateTimeZone how a particular LocalDateTime would be represented in that zone feels natural, while asking the LocalDateTime how it would be represented in a zone feels odd. The zone is a first-class entity, with identity and behavior; the LocalDateTime is just a set of values. Why would the LocalDateTime be expected to know how it is represented in a particular zone?

Even though I replied to that in a "maybe" kind of tone, the argument basically convinced me. The trouble is, a colleague was then surprised when he read the documentation around calendar arithmetic and conversions. Surprising users is pretty much a cardinal sin when it comes to API design – and although in this case it was the relatively harmless sort of surprise ("I can’t find the member I want in A; oh, it turns out it’s in B") rather than a behavioural surprise ("I thought it would do X, but instead it does Y") it’s still bad news. I should reveal my colleague’s bias too – he has experience of Joda Time, where the relevant call is LocalDateTime.toDateTime(DateTimeZone). (There are calls in DateTimeZone itself, but they’re lower-level.)

We’ve discussed this a fair amount (leading to this blog post) and so far we’ve concluded that it really depends on how you think about time zones. As a Noda Time user, would you consider them to be rich objects with complex behaviour, or would you think of them as mere "tokens" to be passed around and used without much direct interaction? The two ways of viewing the type aren’t necessarily in conflict – I’ve deliberately designed CalendarSystem to hide its (fairly ugly) innards. There are a few public instance members, but most are internal. But what about time zones?

There’s an argument to be made for educating Noda Time users to think about time zones as more complex beasts than just tokens, and I’m happy to do that in other areas (such as choosing which type to use in the first place) but here it feels like it’s one step too far. On the other hand, I don’t want to stifle users who are thinking of DateTimeZone in that way. In the mailing list thread, David also expressed a dislike for the approach of including functionality in multiple places – and to a certain extent I agree (one of the things I dislike about its API is that it lets you do just about anything with anything)… but in this case it feels like it’s justified.

Regardless of how you’re thinking about DateTimeZone, it’s more likely that you’re going to want to use a LocalDateTime value which is the result of some other expression, and then apply some "constant" zone to it, then potentially keep going. If you think about a LINQ-style pipeline of operations, the part that varies in the conversion is much more likely to be the LocalDateTime than the time zone. As such, a method on LocalDateTime allows for a more fluent calling style:

var zoned = parseResult.Value
.PlusMonths(1)
.InZone(LondonTimeZone);

versus:

var zoned = LondonTimeZone.AtExactly(parseResult.Value.PlusMonths(1));

Or to keep the code order the same as the execution order:

var local = parseResult.Value.PlusMonths(1);
var zoned = LondonTimeZone.AtExactly(local);

Obviously the effects become more noticeable the more operations you perform. Personally I’m happy with either the first or third approach – although it’s worth being aware that either of the first two have the advantage of being one expression, and therefore easy to use when assigning a static readonly field or something similar.

I’m reasonably happy with having one method on each type, or possibly two (MapLocalDateTime and At*) on DateTimeZone and one (just InZone) on LocalDateTime. I really don’t like the idea of having four methods on DateTimeZone and three methods on LocalDateTime. So, let’s consider the different variations which cut down the number of methods required.

Expressing "exactly," "earlier," and "later" in one method

This is essentially a discussion of the "variations" above. Just to recap, the possibilities we’ve come up with are:

Add another parameter to the method to indicate how to handle ambiguities (or impossibilities) – just return a ZonedDateTime
Return a value of a different type (e.g. ZoneLocalMapping) which can be used to get at all the information you could want
Return a sequence of possible ZonedDateTime values, expecting the caller to probably use LINQ’s First/Last/Single/FirstOrDefault etc to get at the value they want

The last of these is the only one which gives an easy way of handling the extreme corner case of a local time occurring more than twice – for example, a time zone which goes back one hour at 2am (to 1am) and then goes back another two hours at 3am. I think it’s reasonable to dismiss this corner case; however mad time zones can be, I haven’t seen anything quite that crazy yet.

At the time of the original discussion, Noda Time was targeting .NET 2.0, which was one reason for not going with the final option here – we couldn’t guarantee that LINQ would be available. Now, Noda Time is targeting .NET 3.5 in order to use TimeZoneInfo, but it still doesn’t feel like an ideal fit:

Returning a sequence doesn’t give information about (say) the two zone intervals "surrounding" a gap
A sequence may be surprising to users who expect just a single value
The exceptions thrown by First, Single etc when their expectations aren’t met are very domain-neutral; we can give more information
FirstOrDefault will return the default value for ZonedDateTime in the case of ambiguity. That would be unfortunate, as ZonedDateTime is a value type, and its default value is actually somewhat unhelpful. (It has a null calendar system, effectively. There’s not a lot we can do about this, but that’s a post for another day.) We could make it a sequence of Nullable<ZonedDateTime> and guarantee that any values in it are actually non-null, but that’s really straining things.

Putting these together, there are enough negative points to this idea that I’m happy to rule it out. But what about the first two?

The first has the advantage that the caller only needs to make a single method call, effectively passing in a "magic token" (the ambiguity resolver) which they don’t really need to understand. On the other hand, if they want more information, they’ll have to call a different method – and I’m not really sure we want to encourage too much of this "magic token" behaviour.

The second has three disadvantages, all fairly slight:

The user may initially expect the result of a method mapping a LocalDateTime to a ZonedDateTime to be a ZonedDateTime… what’s this extra intermediate result doing? This is "only" a matter of user education, and it’s pretty discoverable. It’s an extra concept the user needs to understand, but it’s a good concept to understand.
Calling two methods or a method and a property (e.g. zone.MapLocalDateTime(localDateTime).Earlier) may end up being slightly more long-winded than a single method call. I can’t get excited about this.
We have to allocate an extra object for the mapping, even when we know it’s unique. Usually, this object will become eligible for garbage collection immediately. We could make it a struct, but I don’t think it’s a natural value type – I’d rather trust that allocating objects in gen0 is pretty cheap.

With the second method, we can replace all the existing methods in DateTimeZone with a single one (or rather, just remove the AtXyz methods, leaving MapLocalDateTime). We can then create pleasantly-named methods on ZoneLocalMapping (which isn’t quite right for this purpose at the moment).

Conclusion

This has been an interesting thought experiment for me, and it’s suggested some changes I will be applying before v1. We’ll see how they pan out. If you want to follow them, look for relevant source code changes.

The important points I’ve been thinking about are:

What would a new user expect to be available? If they haven’t read any documentation, what are they likely to try?
What should the user know about? Where there are important decisions to make, how can we provide guidance?
What would an experienced user (who is already thinking about the Noda Time concepts in the way that we want to encourage) expect to be available?
Where does the balance lie between providing a "too crowded" API (with lots of different ways of achieving the same thing) and a "sparse" API (where there’s always one way of achieving a goal, but it may not be the most convenient one for your situation)
How does our choice fit in with other technologies? For example, the final "variation" seems like it plays nicely with LINQ at first – but a few subtleties make it a worse fit than it might seem.
How does this affect performance? (Performance is important in Noda Time – but there would have to be a significant performance problem for me to deviate from an elegant solution.)

So, any other thoughts? Did we miss some options? What other factors should we have taken into consideration?

The Facts

Some options

Some concerns and questions

Pattern woes

Pattern capitalization issues

Broad pattern incompatibilities

Pattern incompatibility issues

Time zone issues

Understanding time zones

Using time zones in text formatting/parsing

Locale / culture issues

Converting unnecessarily

Conclusion

Example 1: Enumerable.OrderByDescending is broken

Fast, backward compatible, correct – pick any two

Example 2: TimeZoneInfo becomes smarter in .NET 4.6

Example 3: PersianCalendar and leap years

Conclusion

Read-only automatically implemented properties (“autoprops”)

Initial values for automatically-implemented properties

Expression-bodied members

The nameof operator

Dictionary initializers

Using static members

String interpolation

Null propagation

Conclusion

Background

Windows adjustment rules

Awful GUI tooling ahoy!

Awkward cases

So what’s the answer?

First bug: 2012-12-31T22:00:00Z

Second bug: 2013-01-01T00:00:00Z

Conclusion

First bug: Mono’s invariant culture isn’t terribly invariant…

Second bug: ReadOnly losing changes

Conclusion

Background: daylight saving transitions and ambiguous times

DateTime.Kind and conversions

DateTime’s Deep Dark Secret

Unkind comparisons

Conclusions

Back in the real world…

Testing type initialization

Fixing type initializer cycles

Conclusion

Appendix: Testing code

TypeInitializationChecker

TypeInitializationTest

Introduction: time zones, local date/times and zoned date/times – oh my!

The API task: mapping from { LocalDateTime, DateTimeZone } to ZonedDateTime

Options available

Option 1: methods on LocalDateTime

Option 2: methods on DateTimeZone

Option 3: methods (or constructors) on ZonedDateTime

Variation a: any of the above options, but with an enum for ambiguity resolution

Variation b: always go via the mapping result

Variation c: return a sequence of results

Regarding the "source" type

Expressing "exactly," "earlier," and "later" in one method

Conclusion

Example 2: `TimeZoneInfo` becomes smarter in .NET 4.6

Example 3: `PersianCalendar` and leap years