Subtleties in API design – member placement

Noda Time is nearing v1.0, which means I’m spending more time writing documentation than code. It also means reviewing the APIs we’ve got with a critical eye – whether that’s removing extraneous members, adding useful ones, or moving things around. (In particular, writing documentation often suggests where a change would make calling code read more naturally.)

This post is about one particular section of the API, and the choices available. Although I do go into some detail around the specific calls involved, that’s just for context… the underlying choices are ones which could be faced when designing any API. I’ve rarely spent as much time thinking about API decisions as I have with Noda Time, so hopefully this will prove interesting to you even if you really don’t care about Noda Time itself as a project.

Introduction: time zones, local date/times and zoned date/times – oh my!

(Okay, so that’s not quite as snappy as the Judy Garland version, but hey…)

The area of API we’re going to focus on is time zones, and converting between "local" date/time values and "zoned" ones. The three types involved are:

  • LocalDateTime: a "local" date and time, with no specific time zone. So, something like "7:30 in the evening on February 27th 2012". This means different instants in time in different time zones, of course: if you’re arranging a meeting, it’s good enough when the attendees are in the same time zone, but not good enough if you’re meeting with someone on the other side of the world. (A LocalDateTime also has an associated calendar system, but I’m not going to talk about that much in this post.)
  • DateTimeZone: a time zone. At its core, this maps any "instant" in time to an offset – the difference between UTC and local time in that time zone. The offset changes over time, typically (but not always) due to daylight saving changes.
  • ZonedDateTime: a date and time in a particular time zone, with an offset from UTC to avoid ambiguity in some cases (and for efficiency). This identifies a specific instant in time (simply by subtracting the offset from the local date/time). Conceptually this is equivalent to just maintaining the "instant" value, the time zone, and the calendar system – but it’s generally cleaner to think of it as a "local" value with an offset from UTC.

If those brief descriptions don’t make sense for you at the moment (this sort of thing is quite hard to describe concisely and precisely) you may want to see whether the Noda Time user guide "concepts" page helps.

The API task: mapping from { LocalDateTime, DateTimeZone } to ZonedDateTime

It’s easy to get from a ZonedDateTime to a LocalDateTime – you can just use the LocalDateTime property. The difficult bit is the other way round. We obviously want to be able to create a ZonedDateTime from the combination of a LocalDateTime and a DateTimeZone, but the question is where to put this functionality. Three options suggest themselves:

  • A static method (or constructor) in ZonedDateTime which takes both the time zone and the local date/time as arguments
  • An instance method on LocalDateTime which just takes the time zone as an argument
  • An instance method on DateTimeZone which just takes the local date/time as an argument

It gets more complicated though – we’re not really talking about one operation here, but potentially several. Although the mapping from instant to offset is unambiguous in DateTimeZone, the mapping from LocalDateTime to offset is not straightforward. There can be 0, 1 or 2 possible results. For example, in the America/Los_Angeles time zone the clocks go forward from 2am to 3am on Sunday March 11th 2012, and go back from 2am to 1am on Sunday 4th November 2012. That means:

  • The mapping from local date/time to offset at 7.30pm on February 27th 2012 is unambiguous: it’s definitely -8 hours (L.A. is 8 hours behind UTC).
  • The mapping at 2.30am on March 11th 2012 is impossible: at 2am the clocks were put forward to 3am, so 2.30am simply never occurs.
  • The mapping at 2.30am on November 4th 2012 is ambiguous: it happens once before the clocks go back at 3am, and once afterwards. The offset is either -7 or -8 hours, depending on which occurrence you mean.

When mapping a local time to "global" time, this is something you should really think about. Most APIs obscure the issue, but one of the purposes of Noda Time is to force developers to think about issues which they should be aware of. This one is particularly insidious in that it’s the kind of problem which is much more likely to arise when you’re asleep than during daylight hours – so it’s unlikely to be found during manual testing. (Ditto the day of week – most time zones have daylight saving transitions early on a Sunday morning.)

So, Noda Time exposes four ways of mapping a LocalDateTime and DateTimeZone to a ZonedDateTime:

  • Exact: if there’s a single mapping, return it. Otherwise, throw an exception.
  • Earlier: if there’s a single mapping, return it. If there’s more than one, return the earlier one. If the time is skipped, throw an exception.
  • Later: if there’s a single mapping, return it. If there’s more than one, return the later one. If the time is skipped, throw an exception.
  • All information: find out all the information relevant to mapping the given local value – how many matches there are, what they would be, what the time zone information is for each mapping, etc. The caller can then do what they want.

Options available

The question is how we expose these operations. Let’s look at some options, then discuss the pros and cons.

Option 1: methods on LocalDateTime

A lot of Noda Time is designed to be "fluent" so it makes a certain amount of sense to be able to take a LocalDateTime, perform some arithmetic on it, then convert it to a ZonedDateTime, then (say) format it. So we could have something like:

  • var zoned = local.InZone(zone); // implicitly exact
  • var zoned = local.InZoneOrEarlier(zone);
  • var zoned = local.InZoneOrLater(zone);
  • var mapping = local.MapToZone(zone);

Option 2: methods on DateTimeZone

All the calculations involved are really about the time zone – the local date/time value is just a simple value as far as most of this is concerned. So we can put the methods on DateTimeZone instead:

  • var zoned = zone.AtExactly(local);
  • var zoned = zone.AtEarlier(local);
  • var zoned = zone.AtLater(local);
  • var mapping = zone.MapLocalDateTime(local);

Option 3: methods (or constructors) on ZonedDateTime

Maybe we consider the two inputs to be fairly equal, but the result type is more important:

  • var zoned = ZonedDateTime.FromLocal(zone, local);
  • var zoned = ZonedDateTime.FromLocalOrEarlier(zone, local);
  • var zoned = ZonedDateTime.FromLocalOrLater(zone, local);
  • var mapping = ZoneLocalMapping.FromLocal(local)

(I’m not terribly happy about the names here; there could be better ones of course.)

Variation a: any of the above options, but with an enum for ambiguity resolution

We don’t really need four methods on any of these APIs; the first three only differ by how they handle ambiguity (the situation where a particular local date/time occurs twice). We could use an enum to represent that choice instead:

  • var zoned = local.InZone(zone, ZoneAmbiguityResolver.Error);
  • var zoned = local.InZone(zone, ZoneAmbiguityResolver.Earlier);
  • var zoned = local.InZone(zone, ZoneAmbiguityResolver.Later);

(Or a "smart enum" with behaviour, if we wanted. A normal class type with methods etc, but only a restricted set of valid values.)

Variation b: always go via the mapping result

Given that we already have the idea of getting the full mapping results, we can reduce the API to just one method to return the mapping information, and then put extra methods on that:

  • var zoned = local.MapInZone(zone).SingleMatch;
  • var zoned = local.MapInZone(zone).SingleOrEarlier;
  • var zoned = local.MapInZone(zone).SingleOrLater;

(Again, the names aren’t fixed in stone, and the second part could be methods instead of properties if we wanted.)

Variation c: return a sequence of results

If we return a sequence with 0, 1 or 2 ZonedDateTime values, the user can just use LINQ to get the one they want. Again, this can apply wherever we decide to put the method:

  • var zoned = zone.At(local).Single();
  • var zoned = zone.At(local).First();
  • var zoned = zone.At(local).Last();

So, it looks like we effectively have two mostly-orthogonal decisions here:

  • Where to "start" the conversion – the target type for the method call
  • How to represent the multiple options

We’ll consider them separately.

Regarding the "source" type

To start with, I’ll reveal my bias: the existing implementation is option 2 (four methods on DateTimeZone). This was after a small amount of debate on the Noda Time mailing list, and this was the most relevant bit of the discussion:

Me (before going with the current implementation):

It feels a little odd to me to use the zone as the principal class here – just in terms of usability. It makes total sense in terms of the logic, but I tend to think of having a LocalDateTime first, and then converting that to use a particular zone – it’s not an operation which feels like it acts on the zone itself.

David N:

I actually feel the opposite: asking a DateTimeZone how a particular LocalDateTime would be represented in that zone feels natural, while asking the LocalDateTime how it would be represented in a zone feels odd. The zone is a first-class entity, with identity and behavior; the LocalDateTime is just a set of values. Why would the LocalDateTime be expected to know how it is represented in a particular zone?

Even though I replied to that in a "maybe" kind of tone, the argument basically convinced me. The trouble is, a colleague was then surprised when he read the documentation around calendar arithmetic and conversions. Surprising users is pretty much a cardinal sin when it comes to API design – and although in this case it was the relatively harmless sort of surprise ("I can’t find the member I want in A; oh, it turns out it’s in B") rather than a behavioural surprise ("I thought it would do X, but instead it does Y") it’s still bad news. I should reveal my colleague’s bias too – he has experience of Joda Time, where the relevant call is LocalDateTime.toDateTime(DateTimeZone). (There are calls in DateTimeZone itself, but they’re lower-level.)

We’ve discussed this a fair amount (leading to this blog post) and so far we’ve concluded that it really depends on how you think about time zones. As a Noda Time user, would you consider them to be rich objects with complex behaviour, or would you think of them as mere "tokens" to be passed around and used without much direct interaction? The two ways of viewing the type aren’t necessarily in conflict – I’ve deliberately designed CalendarSystem to hide its (fairly ugly) innards. There are a few public instance members, but most are internal. But what about time zones?

There’s an argument to be made for educating Noda Time users to think about time zones as more complex beasts than just tokens, and I’m happy to do that in other areas (such as choosing which type to use in the first place) but here it feels like it’s one step too far. On the other hand, I don’t want to stifle users who are thinking of DateTimeZone in that way. In the mailing list thread, David also expressed a dislike for the approach of including functionality in multiple places – and to a certain extent I agree (one of the things I dislike about its API is that it lets you do just about anything with anything)… but in this case it feels like it’s justified.

Regardless of how you’re thinking about DateTimeZone, it’s more likely that you’re going to want to use a LocalDateTime value which is the result of some other expression, and then apply some "constant" zone to it, then potentially keep going. If you think about a LINQ-style pipeline of operations, the part that varies in the conversion is much more likely to be the LocalDateTime than the time zone. As such, a method on LocalDateTime allows for a more fluent calling style:

var zoned = parseResult.Value
                       .PlusMonths(1)
                       .InZone(LondonTimeZone);

versus:

var zoned = LondonTimeZone.AtExactly(parseResult.Value.PlusMonths(1));

Or to keep the code order the same as the execution order:

var local = parseResult.Value.PlusMonths(1);
var zoned = LondonTimeZone.AtExactly(local);

Obviously the effects become more noticeable the more operations you perform. Personally I’m happy with either the first or third approach – although it’s worth being aware that either of the first two have the advantage of being one expression, and therefore easy to use when assigning a static readonly field or something similar.

I’m reasonably happy with having one method on each type, or possibly two (MapLocalDateTime and At*) on DateTimeZone and one (just InZone) on LocalDateTime. I really don’t like the idea of having four methods on DateTimeZone and three methods on LocalDateTime. So, let’s consider the different variations which cut down the number of methods required.

 

Expressing "exactly," "earlier," and "later" in one method

This is essentially a discussion of the "variations" above. Just to recap, the possibilities we’ve come up with are:

  • Add another parameter to the method to indicate how to handle ambiguities (or impossibilities) – just return a ZonedDateTime
  • Return a value of a different type (e.g. ZoneLocalMapping) which can be used to get at all the information you could want
  • Return a sequence of possible ZonedDateTime values, expecting the caller to probably use LINQ’s First/Last/Single/FirstOrDefault etc to get at the value they want

The last of these is the only one which gives an easy way of handling the extreme corner case of a local time occurring more than twice – for example, a time zone which goes back one hour at 2am (to 1am) and then goes back another two hours at 3am. I think it’s reasonable to dismiss this corner case; however mad time zones can be, I haven’t seen anything quite that crazy yet.

At the time of the original discussion, Noda Time was targeting .NET 2.0, which was one reason for not going with the final option here – we couldn’t guarantee that LINQ would be available. Now, Noda Time is targeting .NET 3.5 in order to use TimeZoneInfo, but it still doesn’t feel like an ideal fit:

  • Returning a sequence doesn’t give information about (say) the two zone intervals "surrounding" a gap
  • A sequence may be surprising to users who expect just a single value
  • The exceptions thrown by First, Single etc when their expectations aren’t met are very domain-neutral; we can give more information
  • FirstOrDefault will return the default value for ZonedDateTime in the case of ambiguity. That would be unfortunate, as ZonedDateTime is a value type, and its default value is actually somewhat unhelpful. (It has a null calendar system, effectively. There’s not a lot we can do about this, but that’s a post for another day.) We could make it a sequence of Nullable<ZonedDateTime> and guarantee that any values in it are actually non-null, but that’s really straining things.

Putting these together, there are enough negative points to this idea that I’m happy to rule it out. But what about the first two?

The first has the advantage that the caller only needs to make a single method call, effectively passing in a "magic token" (the ambiguity resolver) which they don’t really need to understand. On the other hand, if they want more information, they’ll have to call a different method – and I’m not really sure we want to encourage too much of this "magic token" behaviour.

The second has three disadvantages, all fairly slight:

  • The user may initially expect the result of a method mapping a LocalDateTime to a ZonedDateTime to be a ZonedDateTime… what’s this extra intermediate result doing? This is "only" a matter of user education, and it’s pretty discoverable. It’s an extra concept the user needs to understand, but it’s a good concept to understand.
  • Calling two methods or a method and a property (e.g. zone.MapLocalDateTime(localDateTime).Earlier) may end up being slightly more long-winded than a single method call. I can’t get excited about this.
  • We have to allocate an extra object for the mapping, even when we know it’s unique. Usually, this object will become eligible for garbage collection immediately. We could make it a struct, but I don’t think it’s a natural value type – I’d rather trust that allocating objects in gen0 is pretty cheap.

With the second method, we can replace all the existing methods in DateTimeZone with a single one (or rather, just remove the AtXyz methods, leaving MapLocalDateTime). We can then create pleasantly-named methods on ZoneLocalMapping (which isn’t quite right for this purpose at the moment).

Conclusion

This has been an interesting thought experiment for me, and it’s suggested some changes I will be applying before v1. We’ll see how they pan out. If you want to follow them, look for relevant source code changes.

The important points I’ve been thinking about are:

  • What would a new user expect to be available? If they haven’t read any documentation, what are they likely to try?
  • What should the user know about? Where there are important decisions to make, how can we provide guidance?
  • What would an experienced user (who is already thinking about the Noda Time concepts in the way that we want to encourage) expect to be available?
  • Where does the balance lie between providing a "too crowded" API (with lots of different ways of achieving the same thing) and a "sparse" API (where there’s always one way of achieving a goal, but it may not be the most convenient one for your situation)
  • How does our choice fit in with other technologies? For example, the final "variation" seems like it plays nicely with LINQ at first – but a few subtleties make it a worse fit than it might seem.
  • How does this affect performance? (Performance is important in Noda Time – but there would have to be a significant performance problem for me to deviate from an elegant solution.)

So, any other thoughts? Did we miss some options? What other factors should we have taken into consideration?