The joys of date/time arithmetic

(Cross-posted to my main blog and the Noda Time blog, in the hope that the overall topic is still of interest to those who aren’t terribly interested in Noda Time per se.)

I’ve been looking at the "period" part of Noda Time recently, trying to redesign the API to simplify it somewhat. This part of the API is what we use to answer questions such as:

  • What will the date be in 14 days?
  • How many hours are there between now and my next birthday?
  • How many years, months and days have I been alive for?

I’ve been taking a while to get round to this because there are some tricky choices to make. Date and time arithmetic is non-trivial – not because of complicated rules which you may be unaware of, but simply because of the way calendaring systems work. As ever, time zones make life harder too. This post won’t talk very much about the Noda Time API details, but will give the results of various operations as I currently expect to implement them.

The simple case: arithmetic on the instant time line

One of the key concepts to understand when working with time is that the usual human "view" on time isn’t the only possible one. We don’t have to break time up into months, days, hours and so on. It’s entirely reasonable (in many cases, at least) to consider time as just a number which progresses linearly. In the case of Noda Time, it’s the number of ticks (there are 10 ticks in a microsecond, 10,000 ticks in a millisecond, and 10 million ticks in a second) since midnight on January 1st 1970 UTC.

Leaving relativity aside, everyone around the world can agree on an instant, even if they disagree about everything else. If you’re talking over the phone (using a magic zero-latency connection) you may think you’re in different years, using different calendar systems, in different time zones – but still both think of "now" as "634266985845407773 ticks".

That makes arithmetic really easy – but also very limited. You can only add or subtract numbers of ticks, effectively. Of course you can derive those ticks from some larger units which have a fixed duration – for example, you could convert "3 hours" into ticks – but some other concepts don’t really apply. How would you add a month? The instant time line has no concept of months, and in most calendars different months have different durations (28-31 days in the ISO calendar, for example). Even the idea of a day is somewhat dubious – it’s convenient to treat a day as 24 hours, but you need to at least be aware that when you translate an instant into a calendar that a real person would use, days don’t always last for 24 hours due to daylight savings.

Anyway, the basic message is that it’s easy to do arithmetic like this. In Noda Time we have the Instant structure for the position on the time line, and the Duration structure as a number of ticks which can be added to an Instant. This is the most appropriate pair of concepts to use to measure how much time has passed, without worrying about daylight savings and so on: ideal for things like timeouts, cache purging and so on.

Things start to get messy: local dates, times and date/times

The second type of arithmetic is what humans tend to actually think in. We talk about having a meeting in a month’s time, or how many days it is until Christmas (certainly my boys do, anyway). We don’t tend to consciously bring time zones into the equation – which is a good job, as we’ll see later.

Now just to make things clear, I’m not planning on talking about recurrent events – things like "the second Tuesday and the last Wednesday of every month". I’m not planning on supporting recurrences in Noda Time, and having worked on the calendar part of Google Mobile Sync for quite a while, I can tell you that they’re not fun. But even without recurrences, life is tricky.

Introducing periods and period arithmetic

The problem is that our units are inconsistent. I mentioned before that "a month" is an ambiguous length of time… but it doesn’t just change by the month, but potentially by the year as well: February is either 28 or 29 days long depending on the year. (I’m only considering the ISO calendar for the moment; that gives enough challenges to start with.)

If we have inconsistent units, we need to keep track of those units during arithmetic, and even request that the arithmetic be performed using specific units. So, it doesn’t really make sense to ask "how long is the period between June 10th 2010 and October 13th 2010" but it does make sense to ask "how many days are there between June 10th 2010 and October 13th 2010" or "how many years, months and days are there between June 10th 2010 and October 13th 2010".

Once you’ve got a period – which I’ll describe as a collection of unit/value pairs, e.g. "0 years, 4 months and 3 days" (for the last example above) you can still give unexpected behaviour. If you add that period to your original start date, you should get the original end date… but if you advance the start date by one day, you may not advance the end date by one day. It depends on how you handle things like "one month after January 30th 2010" – some valid options are:

  • Round down to the end of the month: February 28th
  • Round up to the start of the next month: March 1st
  • Work out how far we’ve overshot, and apply that to the next month: March 2nd
  • Throw an exception

All of these are justifiable. Currently, Noda Time will always take the first approach. I believe that JSR-310 (the successor to Joda Time) will allow the behaviour to be resolved according to a strategy provided by the user… it’s unclear to me at the moment whether we’ll want to go that far in Noda Time.

Arithmetic in Noda Time is easily described, but the consequences can be subtle. When adding or subtracting a period from something like a LocalDate, we simply iterate over all of the field/value pairs in the period, starting with the most significant, and add each one in turn. When finding the difference between two LocalDate values with a given set of field types (e.g. "months and days") we get as close as we can without overshooting using the most significant field, then the next field etc.

The "without overshooting" part means that if you add the result to the original start value, the result will always either be the target end value (if sufficiently fine-grained fields are available) or somewhere between the original start and the target end value. So "June 2nd 2010 to October 1st 2010 in months" gives a result of "3 months" even though if we chose "4 months" we’d only overshoot by a tiny amount.

Now we know what approach we’re taking, let’s look at some consequences.

Asymmetry and other oddities

It’s trivial to show some assymetry just using a period of a single month. For example:

  • January 28th 2010 + 1 month = February 28th 2010
  • January 29th 2010 + 1 month = February 28th 2010
  • January 30th 2010 + 1 month = February 28th 2010
  • February 28th 2010 – 1 month = January 28th 2010

It gets even more confusing when we add days into the mix:

  • January 28th 2010 + 1 month + 1 day = March 1st 2010
  • January 29th 2010 + 1 month + 1 day = March 1st 2010
  • March 1st 2010 – 1 month – 1 day = January 31st 2010

And leap years:

  • March 30th 2013 – 1 year – 1 month – 10 days = February 19th 2012 (as "February 30th 2012" is truncated to February 29th 2012)
  • March 30th 2012 – 1 year – 1 month – 10 days = February 18th 2012 (as "February 30th 2011" is truncated to February 28th 2011)

Then we need to consider how rounding works when finding the difference between days… (forgive the pseudocode):

  • Between(January 31st 2010, February 28th 2010, Months & Days) = ?
  • Between(February 28th 2010, January 31st 2010, Months & Days) = -28 days

The latter case is relatively obvious – because if you take a whole month of February 28th 2010 you end up with January 28th 2010, which is an overshoot… but what about the first case?

Should we return the determine the number of months by "the largest number such that start + period <= end"? If so, we get a result of "1 month" – which makes sense given the first set of results in this section.

What worries me most about this situation is that I honestly don’t know offhand what the current implementation will do. I think it would be best to return "28 days" as there isn’t genuinely a complete month between the two… <tappety tappety>

Since writing the previous paragraph, I’ve tested it, and it returns 1 month and 0 days. I don’t know how hard it would be to change this behaviour or whether we want to. Whatever we do, however, we need to document it.

That’s really at the heart of this: we must make Noda Time predictable. Where there are multiple feasible results, there should be a simple way of doing the arithmetic by hand and getting the same results as Noda Time. Of course, picking the best option out of the ones available would be good – but I’d rather be consistent and predictable than "usually right" be unpredictably so.

Think it’s bad so far? It gets worse…

ZonedDateTime: send in the time zones… (well maybe next year?)

I’ve described the "instant time line" and its simplicity.

I’ve described the local date/time complexities, where there’s a calendar but there’s no time zone.

So far, the two worlds have been separate: you can’t add a Duration to a LocalDateTime (etc), and you can’t add a Period to an Instant. Unfortunately, sooner or later many applications will need ZonedDateTime.

Now, you can think of ZonedDateTime in two different ways:

  • It’s an Instant which knows about a calendar and a time zone
  • It’s a LocalDateTime which knows about a time zone and the offset from UTC

The "offset from UTC" part sounds redundant at first – but during daylight saving transitions the same LocalDateTime occurs at two different instants; the time zone is the same in both cases, but the offset is different.

The latter way of thinking is how we actually represent a ZonedDateTime internally, but it’s important to know that a ZonedDateTime still unambiguously maps to an Instant.

So, what should we be able to do with a ZonedDateTime in terms of arithmetic? I think the answer is that we should be able to add both Periods and Durations to a ZonedDateTime – but expect them to give different results.

When we add a Duration, that should work out the Instant represented by the current DateTime, advance it by the given duration, and return a new ZonedDateTime based on that result with the same calendar and time zone. In other words, this is saying, "If I were to wait for the given duration, what date/time would I see afterwards?"

When we add a Period, that should add it to the LocalDateTime represented by the ZonedDateTime, and then return a new ZonedDateTime with the result, the original time zone and calendar, and whatever offset is suitable for the new LocalDateTime. (That’s deliberately woolly – I’ll come back to it.) This is the sort of arithmetic a real person would probably perform if you asked them to tell you what time it would be "three hours from now". Most people don’t take time zones into account…

In most cases, where a period can be represented as a duration (for example "three hours") the two forms of addition will give the same result. Around daylight saving transitions, however, they won’t. Let’s consider some calculations on Sunday November 7th 2010 in the "Pacific/Los_Angeles" time zone. It had a daylight saving transition from UTC-7 to UTC-8 at 2am local time. In other words, the clock went 1:58, 1:59, 1:00. Let’s start at 12:30am (local time, offset = -7) and add a few different values:

  • 12:30am + 1 hour duration = 1:30am, offset = -7
  • 12:30am + 2 hours duration = 1:30am, offset = -8
  • 12:30am + 3 hours duration = 2:30am, offset = -8
  • 12:30am + 1 hour period = 1:30am, offset = ???
  • 12:30am + 2 hour period = 2:30am, offset = -8
  • 12:30am + 3 hour period = 3:30am, offset = -8

The ??? value is the most problematic one, because 1:30 occurs twice… when thinking of the time in a calendar-centric way, what should the result be? Options here:

  • Always use the earlier offset
  • Always use the later offset
  • Use the same offset as the start date/time
  • Use the offset in the direction of travel (so adding one hour from 12:30am would give 1:30am with an offset of -7, but subtracting one hour from 2:30am would give 1:30am with an offset of -8)
  • Throw an exception
  • Allow the user to pass in an argument which represents a strategy for resolving this

This is currently unimplemented in Noda Time, so I could probably choose whatever behaviour I want, but frankly none of them has much appeal.

At the other daylight saving transition, when the clocks go forward, we have the opposite problem: adding one hour to 12:30am can’t give 1:30am because that time never occurs. Options in this case include:

  • Return the first valid time after the transition (this has problems if we’re subtracting time, where we’d presumably want to return the latest valid time before the transition… but the transition has an exclusive lower bound, so there’s no such "latest valid time" really)
  • Add the offset difference, so we’d skip to 2:30am
  • Throw an exception
  • Allow the user to pass in a strategy

Again, nothing particularly appeals.

All of this is just involved in adding a period to a ZonedDateTime – then the same problems occur all over again when trying to find the period between them. What’s the difference (as a Period rather than a simple Duration) between 1:30am with an offset of -7 and 1:30am with an offset of -8? Nothing, or an hour? Again, at the moment I really don’t know the best course of action.

Conclusion

This post has ended up being longer than I’d expected, but hopefully you’ve got a flavour of the challenges we’re facing. Even without time zones getting involved, date and time arithmetic is pretty silly – and with time zones, it becomes very hard to reason about – and to work out what the "right" result to be returned by an API should be, let alone implement it.

Above all, it’s important to me that Noda Time is predictable and clearly documented. Very often, if a library doesn’t behave exactly the way you want it to, but you can tell what it’s going to do, you can work around that – but if you’re having to experiment to guess the behaviour, you’re on a hiding to nothing.

28 thoughts on “The joys of date/time arithmetic”

  1. “February is either 28 or 29 days long depending on the month”: I think you mean “February is either 28 or 29 days long depending on the YEAR”?

    Like

  2. “January 29th 2012 + 1 year + 1 month + 1 day = March 29th 2013 (goes via February 28th 2013, because we add the year first and February 29th 2012 doesn’t exist)”
    : “February 29th 2013 doesn’t exist”

    Like

  3. @Dougas: Fixed, thanks. Gah, I should really have proof-read this one :( (Especially as I need to apply all fixes to both blogs…)

    Like

  4. January 29th 2012 + 1 year + 1 month + 1 day = March 29th 2013 (goes via February 28th 2013, because we add the year first and February 29th 2013 doesn’t exist)

    Is this not March 1st 2013? (2012-Jan-29 + 1 year -> 2013-Jan-29; + 1 month -> 2013-Feb-28 rounded down; + 1 day -> 2013-Mar-01)

    March 29th 2013 – 1 year – 1 month – 1 day = January 28th 2012 (goes via February 29th 2012, which does exist)

    Similarly, should this not be February 28th, 2012? (2013-Mar-29 – 1 year -> 2012-Mar-29; – 1 month -> 2012-Feb-29; – 1 day -> 2012-Feb-28)

    Like

  5. To further confuse things (and to follow on from your section on asymmetry), subtraction using different units is not associative.
    i.e. the order in which you subtract can change the result:

    (March 1st 2010 – 1 month) – 1 day = January 30th 2010
    but
    (March 1st 2010 – 1 day) – 1 month = January 28th 2010

    I don’t envy you the task of managing the expectations of the API!

    Like

  6. @John: Yes, I’m stupid. Will fix. It makes the difference even more blatant :)

    @Chris: Indeed. I may add a note to that effect…

    Like

  7. I agree with John. I would expect the leap year days different.

    Also, there seems to be asymmetry in operation order:

    January 28th 2010 + 1 mon th + 1 day = March 1st 2010
    January 28th 2010 + 1 day + 1 month = February 28th 2010 ?

    Is this correct?

    And it’s a long post indeed. I still have to read ZonedDateTime, but will do this another time.

    Like

  8. @Doeke: Yes, operation order definitely matters. We always apply a single period in “most significant unit first” order, but you can of course add several periods one at a time.

    Like

  9. What’s the use case for Periods with values for units less than 1 day? “The point in time when the calender shows a day 1 greater than now” is “Tomorrow” – a useful and well-defined concept, but “The point in time when the clock shows an hour one greater than now” seems fairly useless, ill-defined, and in any case, not at all what (I expect) a user would think it would do. I’d suggest just dropping < 1 day granularity from Periods.

    Like

  10. @Simon: I disagree. I think it’s entirely reasonable to say “in the context of a LocalDateTime, how many seconds are there between this LocalDateTime and that one?”.

    Possibly dropping the ability to add Period to ZonedDateTime would be appropriate, mind you…

    Like

  11. And you haven’t considered the hardest case ;-)

    2010-01-31 plus Period(1 month, -1 day)

    Joda-Time got this wrong, and gave 2010-02-27. Thats because it resolved invalid dates after each step (2010-01-31 plus 1 month -> 2010-02-28, minus 1 day -> 2010-02-27).

    The correct answer (when adding a single period object) is 2010-02-28.

    The Javadoc and rules in JSR-310 explain what I think is the best solution:
    https://jsr-310.dev.java.net/nonav/doc-2010-06-22/javax/time/calendar/LocalDate.html#plus%28javax.time.calendar.PeriodProvider%29

    Like

  12. @Stephen: Right, that’s what Noda Time would do at the moment too. I wonder how feasible it will be to implement the “right” solution using the Joda Time engine (which is basically what we’ve got)…

    Like

  13. I love your blog, Jon, but perhaps you could split posts like this into parts, much like Eric Lippert’s series on C#5 and async. If you’d made the ZonedDataTime part a second blog entry (even if it was just for tomorrow), I think my feeble brain could’ve taken this in a bit more.

    By the way, loving C# in Depth, 2nd Ed. Just finished the bits on C#2 and came away learning some stuff I thought I already knew. Great style at a helter-skelter pace. Love it.

    Like

  14. @skeet: Perhaps this is my interpretation of Period, but “the amount of seconds” sound like a Duration to me. Of course, now your algebra is all screwy….

    Perhaps Period is the difference type for recurring dates’ absolute types?

    Like

  15. @Simon: Time-based periods have direct mappings to durations, certainly… but what would you do if you wanted “days and hours” together? I prefer to keep Duration for instant-based types, and Period for calendar-based types – just taking advantage of the fact that time-field-only periods are relatively straightforward :)

    Like

  16. @skeet: yes, this is simpler in the algebra, I just don’t know what a 3 hr Duration *means*, and I believe that to be why the OP ambiguities arise.
    If I had to represent “days and hours”, I personally would simply carry around a date-only Period and a hour-only Duration, which would make the behavior clearer. Not to say there isn’t one but I can’t think of a situation where a date and time Period would make sense, the closest I can get is “3pm tomorrow”.
    If you can’t make a pit of success, at least fill in the pit of failure, if you will. Of course, I could stop arm-chair architecting and try some actual design work, but I have 3 other projects looking at me here :)

    Like

  17. Jon, do you have performance recommendations for date/time arithmetic in C#? E.g converting to “ticks” before performing operations

    Thanks for this blog entry, Paulus

    Like

  18. @Paulus: Not really. I’d definitely favour correctness over performance in most cases, and that may well leave little wiggle room. It will also depend on whether you’re talking about Noda Time or the built-in API.

    Basically it’s highly context-sensitive :)

    Like

  19. And just when I thought I was getting a handle on how this all might work, I remembered leap seconds. They don’t even know how many will be added or subtracted on a given year!

    Like

  20. @Timothy: Well, we’re not trying to handle leap seconds in Noda Time. I don’t believe most developers want or need that extra detail/complexity.

    Like

  21. Alas, it looks like Noda time is going to remain among the large group of systems that do NOT handle local time properly from a historical perspective.

    The offsets are dependent on the year (or at least a range of years). Consider the recent changes to Daylight Saving Time in the USA. If you use nearly any library today to find out what time it was in New York City given a GMT time that is a few years old (and is during the summer, when the offset is in effect) then you may get an incorred answer as the current rules for when DST is in effect are not the same as the rules were a few years ago.

    Finding an accurate historical list of all timezones and their quirks is nearly impossible.

    Like

  22. @David: What makes you think that? Noda Time will use the zoneinfo database for historical time zone information. Note that even the .NET built-in APIs provide that information on recent versions of Windows :)

    Like

Leave a comment