`DateTime` arithmetic on `Microsecond` (or smaller) scale

tim.holy · August 21, 2023, 12:56pm

I was asked to weigh in here. I’m not a heavy (or even moderate) user of the Dates library, and I have very little skin in the game despite having been pulled into this discussion. For what it’s worth, with respect to representation and rules of arithmetic I’m coming to the conclusion that Dates is a mixed bag, and sufficiently internally inconsistent that something needs to change.

The bigger problem

This discussion has so far not addressed what I view as the biggest problem with the Dates library: there’s almost nothing you can actually do with DateTime objects other than construct them, look at them, order them, and offset them. They’re broken for basically everything else you might want to do with them.

Things like t = range(now() - Year(5), stop=now(), length=1001) don’t work. (You can construct it, but you can’t collect it or otherwise use values generated by the range because the constructor errors whenever you try to construct a time that isn’t an integer number of milliseconds.) If I’m trying to plot the performance of my stock portfolio over the last 5 years, I really don’t care if there’s 1ms of jitter between adjacent points: I want range to work and collecting it should round to the nearest millisecond. Likewise, if I’m trying to do statistics on the gap between two particular events, it’s really weird that mean(Δts) doesn’t work unless the mean magically works out to be an exact integer number of milliseconds.

With raw arithmetic we could allow the user to choose what they want to happen: e.g., should it be t + x and throw an error, or t + round(Millisecond, x)?—let the user decide! While this works really well for such low-level operations, it fails as soon as you get “one deep” into external code. It basically requires that you reimplement every operation specifically for DateTime objects; you can’t use most of Julia’s packages on objects created by the Dates library. (You probably wouldn’t want to use most of them, of course, but there are clearly some interesting things you’d like to be able to do.)

This, in my eyes, is an enormous failure to support generic programming, and the single biggest thing that needs to change.

Fixing it

There’s an easy fix: make the constructors round-to-nearest. This views DateTime as representing a continuous segment of the real line. Once you’ve adopted that view, then of course arithmetic should round, too (it basically would automatically).

Breaking changes

Is changing to round-to-nearest breaking? With all due respect to @anowacki, I’m skeptical. If we improve the precision of log(x), to make it more accurate in the last ulp of precision, that’s not a breaking change, that’s a bugfix. For DateTime, millisecond is the “ulp.” So switching from truncation to rounding is not breaking, and I don’t even think that switching from Millisecond(1.2) throwing an error to Millisecond(1.2) == Millisecond(1) is breaking (it would be, however, going the other way). That said, it may be a bit irrelevant if we move Dates out soon as an upgradable stdlib; Dates 2.0 will hopefully arrive long before Julia 2.0.

But this isn’t how integers work!

Date stdlib objects use integers for their internal representation. I’m not 100% sure I understand why; I can find lots of admonitions not to use floating-point for currency, but I haven’t found good hits on this topic specifically for dates & times. Nevertheless, I presume the reason they use integers is because integer arithmetic is associative: a + b - a == b whereas that’s true only under special circumstances for floating-point. If you don’t want to, say, break ordering relationships when doing arithmetic, then associativity is a really, really important property.

But are date/time objects integers? No: convert(Int, Millsecond(5)) throws an error. If it didn’t, you could do this:

convert(Nanosecond, convert(Int, Millisecond(5))

and come to the conclusion that 5ns ≈ 5ms, which is obviously complete wrong. Just because a struct represents something internally using an integer does not make it an integer; in this case I’m guessing the fundamental reason for that choice is to make arithmetic associative. Integers can be used in math only because we promote to Float64 for many operations like range and mean, but we don’t have a Float64-variant of DateTime so the only choice is to do rounding.

So just because they represent things internally using integers does not mean that all their numeric traits must inherit from integer.

Why does this mean we have to support `+(::Millisecond, ::Microsecond)` and `+(::DateTime, ::Microsecond)`?

Microsecond and millisecond are just units of time. We expect to be able to add feet and meters, for example, but not meters and seconds. Concerns that someone might lose some precision when working with microseconds should not overwhelm that basic mental model.

We provide the tools to do as well as one can: if you have a lot of microseconds that you want to do arithmetic on, keep them separate and stir them into DateTime at the last possible moment. We’d also advise people to sum the list [1e20, 1, -1e20] in a custom manner, too. Arithmetic precision issues are not unique to Microsecond, they occur in any case where you represent pure mathematics on a computer.

So what about Microsecond and Nanosecond?

Perhaps we should delete these two types in Dates 2.0. But until then we’re stuck with them.

Topic		Replies	Views
Higher resolution DateTime/TimeStamp Internals & Design dates	44	10351	February 13, 2019
Why do time quantities have to be integers? Internals & Design question	56	5163	February 5, 2018
TimesDates.jl dev for v2 Specific Domains dates , time , internals	44	2357	September 14, 2023
Differences smaller than Dates.Milliseconds are ignored Internals & Design question , bug , dates	7	758	July 6, 2019
Universal Time vs UTC Time in Dates General Usage dates	11	1611	August 26, 2023