`DateTime` arithmetic on `Microsecond` (or smaller) scale

There is currently a PR on the repo with very lively discussion about what the following operation should do:

using Dates

a = now() 
b = a + Nanosecond(1)

(the situation is analogous with Microsecond)

Currently, the result is always exactly equal to a, because the DateTime arithmetic implemented in Dates simply truncates the non-DateTime argument to milliseconds (as that is the supported precision of DateTime), and whatever the result is, will be added to DateTime:

julia> using Dates

julia> a = now()
2023-08-18T14:35:44.462

julia> a === a + Nanosecond(1)
true

julia> a === a + Microsecond(999)
true

julia> a === a + Microsecond(1000) # now we're at the millisecond scale
false

julia> (a + Microsecond(1000)) === (a + Microsecond(1999))
true

This has consequences when trying to accumulate a number of durations into a DateTime, for example to figure out how long some continued measurements took, perhaps from samples of an oscilloscope:

julia> a = now()

julia> reduce(+, fill(Microsecond(1234), 10000), init=a)
2023-08-18T14:22:02.527

julia> a + reduce(+, fill(Microsecond(1234), 10000), init=Nanosecond(0))
2023-08-18T14:22:04.867

This, where two different forms of accumulation lead to different results, happens because in the first form due to the repeated truncation there is an accumulation of error, and thus the result will not be as exact as it could be. In essence, DateTime + Nanosecond or DateTime + Microsecond are not associative. This is problematic, because the dataloss is silent, and near impossible to catch after the fact - especially if you donā€™t have access to the original data anymore.

There are more or less three ways to resolve this:

  • Keep the current behavior, and simply document that this kind of arithmetic does some form of rounding.
  • Increase the precision of DateTime up to Nanosecond precision (and not support smaller scales at all, effectively pushing the failure case to higher precisions, but not solving it entirely). This would necessitate an increase in size of DateTime (from currently 64 bits to 128 bits, to accomodate the additional data needed to keep track of everything).
  • Deprecate the existing behavior, and direct users to use round/trunc/ceil/floor on their too-small-to-fit-in-DateTime calculations, to explicitly specify the behavior they want. Only arithmetic with exact results are allowed - the methods in question would be removed when (if ever) a breaking release in Julia happens.

Iā€™m personally in favor of option 3), because itā€™s my believe that a programming language (and standard library, targeting a very general use case) should have as few foot guns as possible, and should give hints & nudges towards correct usage for their application, with as few guesses about what the user cares about as possible.

Iā€™d like to hear what the community thinks though, and in particular Iā€™d like to hear from people who often use this kind of arithmetic and what theyā€™d expect to happen in these circumstances.

4 Likes

I personally donā€™t need any finer precision than milliseconds when operating with DateTimes but I donā€™t like that various operations with Micro- and Nanoseconds are available without throwing a warning. Its feels like wrong advertising. If Julia allows to add periods to dates the result is expected to be accurate or the operation should error.
Imagine operations on Strings silently taking away a couple of chars because of limitations in the internal implementation. There would be no discussion that this isnā€™t the right behaviour.

It is also not coherent to allow operations with hidden truncating given that:

  • Addition of Milliseconds and Nanoseconds result in a CompoundPeriod without losing accuracy:
julia> Millisecond(1)  + Nanosecond(1)
1 millisecond, 1 nanosecond
  • Operations on periods alone do throw InexactError:
julia> Millisecond(1)  / 2
ERROR: InexactError: Int64(0.5)
3 Likes

I agree with that - the argument brought forth by other people (e.g. @tim.holy, @Oscar_Smith) in favor of truncation is that DateTime is kind of like a fixed point number, and thus it should not error when performing operations that underflow. I donā€™t agree with that conclusion though, because nothing about fixed-point-ness prescribes that inaccurate results need to be ā€œmade to conformā€, so to speak. Itā€™s valid behavior for fixed-point numbers to refuse operations that lose data.

Doesnā€™t the package TimesDates.jl handle this specific problem? See the examples.

Indeed - the argument Iā€™m trying to make is that Base doesnā€™t handle the problem (and related ones that crop up with truncation/rounding) at all, and thus should disallow that kind of arithmetic on DateTime in particular (though not in general for AbstractDateTime).

1 Like

Just wondering: why does DateTime not consist of a Date and a Time? Is there any other concern besides performance?

This is clearly a can of worms, also in other languages, I see new ā€œsince C23ā€ (and asctime, but not asctime_s, is deprecated since C23):
https://en.cppreference.com/w/c/chrono/timespec_getres

What is exactly the problem if this did:

julia> a = now(); println(a); a + Microsecond(1)
2023-08-20T21:58:21.211
2023-08-20T21:58:21.211001

It would be an easy change, and seems ok, for ā€œcommunicating partiesā€ (and Julia itself), and if not, then just donā€™t do the above(?): ISO 8601 - Wikipedia

There is no limit on the number of decimal places for the decimal fraction. However, the number of decimal places needs to be agreed to by the communicating parties. For example, in Microsoft SQL Server, the precision of a decimal fraction is 3 for a DATETIME, i.e., ā€œyyyy-mm-ddThh:mm:ss[.mmm]ā€.[28]

What happens if you store e.g. ā€œ2023-08-20T21:58:21.211001ā€ to common databases, such as PostgreSQL and SQL Server (or send to some web API)?

I could see an option to only show if divisible by 1000 microseconds (and then omitting the extra or implied (ā€œ000ā€), otherwise throw, or if better truncate, which you could opt into (or should it be the default, and you rather opt into showing in full?).

I see what now() does is (so it stores with microseconds, but with millisecond accuracy, by design):

return DateTime(tm.year + 1900, tm.month + 1, tm.mday, tm.hour, tm.min, tm.sec, div(tv.usec, 1000))

and the constructor:

>..
    h = adjusthour(h, ampm)
    rata = ms + 1000 * (s + 60mi + 3600h + 86400 * totaldays(y, m, d))
    return DateTime(UTM(rata))


abstract type Instant <: AbstractTime end

"""
    UTInstant{T}

The `UTInstant` represents a machine timeline based on UT time (1 day = one revolution of
the earth). The `T` is a `Period` parameter that indicates the resolution or precision of
the instant.
"""
struct UTInstant{P<:Period} <: Instant
    periods::P
end

# Convenience default constructors
UTM(x) = UTInstant(Millisecond(x))
UTD(x) = UTInstant(Day(x))

I would like us to keep using Int64, not go to Int128 or:

Network File System version 4 has defined its time fields as struct nfstime4 {int64_t seconds; uint32_t nseconds;}

for nanoseconds. For that a + Nanosecond(1) could return a different type, UTN, stored in nanoseconds.

Option 3 definitely for me. I know to be careful of inexact results when Iā€™m working with floating point numbers. For pretty much everything else I tend to assume (right or wrong) that arithmetic is exact.

So basically for the reasons you said, I donā€™t like option 1. And I think option 2 (128 bits) is a level of precision not needed for the large majority of use cases. Therefore option 2 behaviour is more suitable to a package than Base. Therefore by process of elimination Iā€™m for option 3.

2 Likes

I also find it very confusing that you can add microseconds but they get lost. I know that dates are implemented with integers under the hood (and you notice that because you cannot instantiate Second(1.2), for example). So I would assume that if addition doesnā€™t error, itā€™s exact (minus Int64 over/underflow behavior maybe).

1 Like

That also has come up in the linked PR (itā€™s gotten way too long to look through casually though), and on slack @tim.holy pointed out that at this years JuliaCon, Steven Wolfram made fun of Julia having integer wraparound. Surely Iā€™m not arguing for making Int promote to a bigger type, right?

Well, Iā€™m not, and for one very simple reason - Int is emulating hardware integers, where wraparound is a consequence of having to implement the thing in some capacity in hardware. Itā€™s an extremely fundamental datatype, and getting the exact semantics the hardware (by necessity!) implements is a big boon. In contrast, DateTime is not fundamental to the same extent - there are lots of ways you can implement a DateTime, Julia just happens to have an implementation that can be packed exactly into an Int64. This doesnā€™t mean that DateTime necessarily must follow Int64 (or Float64 or any number, really) in its semantics though; itā€™s very much its own thing (and wraparound for DateTime has other issues, such as Y2K38, or Y2K or any of these). Software Engineering as a discipline has long learned to make sure datetime arithmetic is exact, because modeling it as any form of inexact arithmetic leads to so many subtle bugs that really arenā€™t appropriate for this kind of business logic at all.


@Oscar_Smith brought up an argument for why it shouldnā€™t error - namely, imagine using DateTime and its arithmetic as the time variable in an ODE. Doing silent truncation/rounding when the solver tries to add e.g. 1 Day, 3 Hours, 24 Microseconds makes the code ā€œjust workā€. However, my counterpoint to that is that now you can get into a situation where your solver enters an infinite loop - if the variable time step solver decides ā€œok, now letā€™s step Microsecond(123) forwardā€, the arithmetic wonā€™t actually end up stepping at all. You end up repeatedly solving the same numbers at the same time points forever, because the solver wonā€™t ever reach its termination target due to the step being taken rounding away completely. Not to mention that Iā€™d very much consider it a bug in the solver if it tries to take smaller steps than eps(T) (which is always Millisecond(1) for DateTime).

So, Iā€™d vastly prefer an ODE solver to error here, alerting me to the failure case as soon as I try to make a calculation that just canā€™t be accurately represented.

2 Likes

Agree. Seems like ODE solvers have different needs that might be served better by a specialized type. Generic date computations should not suffer for that special case.

2 Likes

Firstly, my opinion is that any change here apart from documentation is breaking. That includes the proposal in the PR to change the rounding mode from floor to round. I do not believe that should happen without a major version change (presumably if/when Dates becomes an upgradable stdlib), because it would change the behaviour of peopleā€™s programs.

Secondly, if DateTime remains a type which represents a whole number of fractions of a second (i.e., if it remains resolved to ms, Āµs, ns, etc.), then silent rounding should not happen when they are added to. Instead, an error should be thrown telling you to appropriately round any Dates time periods to the smallest-resolved time period that DateTime supports. This avoids bugs due to not knowing how the Dates library works in great detail, which you currently need in order to work with it.

More broadly, I would like to see in the future a stdlib Dates package where the DateTime had ns resolution. @JeffreySarnoffā€™s young but excellent NanoDates.jl I know takes the exact approach mentioned by @rafael.guerra and would provide that. But my understanding is that having a time type which interoperates with Dates is a challenge when it is not actually part of it. I donā€™t know whether NanoDates have vastly different performance to DateTimes for arithmetic.

Finally, although I think for many people an integral time type is important, it would also be nice if there could be the option of a DateTime-like type which had floating-point seconds. This would allow you to represent time continuously and accept any rounding errors.

1 Like

Would you be alright with a deprecation of the arithmetic between DateTime and Microsecond/Nanosecond? As you say, we canā€™t just make it error, because that breaks working code. A deprecation notice should be ok though.

Yes, a deprecation would be the best non-breaking option I think.

1 Like

Not sure if this exists already, but in many places like UNIX time is represented just like that anyway, and we have functions like unix2datetime in Dates to make interoperability work. However, you have to pass in an integer, not a float, currently. Maybe a wrapper type EpochDateTime or so would work where you can choose the reference time point and unit via type parameter. And convert over to DateTime lossily via round(DateTime, e::EpochDateTime) so it would be clear that this is not exact.

I was asked to weigh in here. Iā€™m not a heavy (or even moderate) user of the Dates library, and I have very little skin in the game despite having been pulled into this discussion. For what itā€™s worth, with respect to representation and rules of arithmetic Iā€™m coming to the conclusion that Dates is a mixed bag, and sufficiently internally inconsistent that something needs to change.

The bigger problem

This discussion has so far not addressed what I view as the biggest problem with the Dates library: thereā€™s almost nothing you can actually do with DateTime objects other than construct them, look at them, order them, and offset them. Theyā€™re broken for basically everything else you might want to do with them.

Things like t = range(now() - Year(5), stop=now(), length=1001) donā€™t work. (You can construct it, but you canā€™t collect it or otherwise use values generated by the range because the constructor errors whenever you try to construct a time that isnā€™t an integer number of milliseconds.) If Iā€™m trying to plot the performance of my stock portfolio over the last 5 years, I really donā€™t care if thereā€™s 1ms of jitter between adjacent points: I want range to work and collecting it should round to the nearest millisecond. Likewise, if Iā€™m trying to do statistics on the gap between two particular events, itā€™s really weird that mean(Ī”ts) doesnā€™t work unless the mean magically works out to be an exact integer number of milliseconds.

With raw arithmetic we could allow the user to choose what they want to happen: e.g., should it be t + x and throw an error, or t + round(Millisecond, x)?ā€”let the user decide! While this works really well for such low-level operations, it fails as soon as you get ā€œone deepā€ into external code. It basically requires that you reimplement every operation specifically for DateTime objects; you canā€™t use most of Juliaā€™s packages on objects created by the Dates library. (You probably wouldnā€™t want to use most of them, of course, but there are clearly some interesting things youā€™d like to be able to do.)

This, in my eyes, is an enormous failure to support generic programming, and the single biggest thing that needs to change.

Fixing it

Thereā€™s an easy fix: make the constructors round-to-nearest. This views DateTime as representing a continuous segment of the real line. Once youā€™ve adopted that view, then of course arithmetic should round, too (it basically would automatically).

Breaking changes

Is changing to round-to-nearest breaking? With all due respect to @anowacki, Iā€™m skeptical. If we improve the precision of log(x), to make it more accurate in the last ulp of precision, thatā€™s not a breaking change, thatā€™s a bugfix. For DateTime, millisecond is the ā€œulp.ā€ So switching from truncation to rounding is not breaking, and I donā€™t even think that switching from Millisecond(1.2) throwing an error to Millisecond(1.2) == Millisecond(1) is breaking (it would be, however, going the other way). That said, it may be a bit irrelevant if we move Dates out soon as an upgradable stdlib; Dates 2.0 will hopefully arrive long before Julia 2.0.

But this isnā€™t how integers work!

Date stdlib objects use integers for their internal representation. Iā€™m not 100% sure I understand why; I can find lots of admonitions not to use floating-point for currency, but I havenā€™t found good hits on this topic specifically for dates & times. Nevertheless, I presume the reason they use integers is because integer arithmetic is associative: a + b - a == b whereas thatā€™s true only under special circumstances for floating-point. If you donā€™t want to, say, break ordering relationships when doing arithmetic, then associativity is a really, really important property.

But are date/time objects integers? No: convert(Int, Millsecond(5)) throws an error. If it didnā€™t, you could do this:

convert(Nanosecond, convert(Int, Millisecond(5))

and come to the conclusion that 5ns ā‰ˆ 5ms, which is obviously complete wrong. Just because a struct represents something internally using an integer does not make it an integer; in this case Iā€™m guessing the fundamental reason for that choice is to make arithmetic associative. Integers can be used in math only because we promote to Float64 for many operations like range and mean, but we donā€™t have a Float64-variant of DateTime so the only choice is to do rounding.

So just because they represent things internally using integers does not mean that all their numeric traits must inherit from integer.

Why does this mean we have to support +(::Millisecond, ::Microsecond) and +(::DateTime, ::Microsecond)?

Microsecond and millisecond are just units of time. We expect to be able to add feet and meters, for example, but not meters and seconds. Concerns that someone might lose some precision when working with microseconds should not overwhelm that basic mental model.

We provide the tools to do as well as one can: if you have a lot of microseconds that you want to do arithmetic on, keep them separate and stir them into DateTime at the last possible moment. Weā€™d also advise people to sum the list [1e20, 1, -1e20] in a custom manner, too. Arithmetic precision issues are not unique to Microsecond, they occur in any case where you represent pure mathematics on a computer.

So what about Microsecond and Nanosecond?

Perhaps we should delete these two types in Dates 2.0. But until then weā€™re stuck with them.

6 Likes

Fair enough, but please remember that you pulled yourself into this DateTime discussion by commenting on the PR. None of the other commenters forced you to comment - I only asked to clarify your position on discourse as well, because the discussion on the PR was getting too long & thereā€™s a wider community of possible responses here.

I have mentioned that in the PR as well, but I have no qualms about range rounding the step internally to ensure everything lands on exactly representable values. Itā€™s fine to treat the construction of the range differently from arithmetic in isolation; weā€™re doing the same with Base.TwicePrecision and Float64 after all, when construction ranges of Float64.

Due to the nature of how ā€œdata from the real worldā€ works, wouldnā€™t it be better to have your stock data already be associated with a date & time? How would you be able to ensure that the resulting mapping of a point in time to the stock price was accurate?

As mentioned in the PR, I donā€™t think this is weird at all - mean gives you an estimator of the distribution, not necessarily an element of that distribution. The two types are not necessarily the same - just like you get a Float64 if you call mean([1,2,3]).

I disagree - not everything that could work in a generic setting necessarily should work in a generic setting, if the semantics of the expected behavior of a type and the actual behavior of that type differ too much. Yes, that means less code that ā€œjust worksā€, but arguably, that code doesnā€™t ā€œjust workā€ if the results are undesirable semantics.

No, the choice to represent the data internally as in integer is to treat the integer as a ā€œbag of bitsā€ that we can use to store the integer valued ā€œdatetimeā€ as a multiple of a number of Milliseconds:

This is done to use Millisecond as a Rata Die base for the purposes of datetime calculations (as also documented in the manual). In the end, this is simply an efficient packing of the data DateTime needs to store, instead of being wasteful and having e.g. fields of Int8 per precision point (most bits of each field wouldnā€™t be used, due to the constraints the constructor imposes). That fact that this is done in an Int64 is an implementation detail (though the manual sadly mentions it explicitlyā€¦).

I agree with that, but that doesnā€™t mean we have to bend the assumptions regular programmers expect of a type (that is, datetime arithmetic being exact) just to support a usecase that has an imprecise result. Being exact here is a safer & less surprising than the alternative, and in the end makes for a better user experience.

I always thought that the fix should be more fundamental: DateTime values and Period values should represent times (in milliseconds) using floating-point values instead of using 64-bit integers. These represent integers exactly up to 2^{53}, so there is no loss of precision except for time periods > 280,000 years, and can anyone measure such periods with millisecond precision? Then you get, for free, much more graceful behavior for both sub-ms intervals and for extremely long time intervals (astronomy?).

3 Likes

That would make DateTime much less useful, as of today it can represent a span of ~580 million years with millisecond precision:

The packing of the numbers is not naive; itā€™s very efficient.

How is that useful? In what real application do you have millisecond precision over such a long time period?

1 Like