`DateTime` arithmetic on `Microsecond` (or smaller) scale

Sukera · August 21, 2023, 2:15pm

My point was more that we can currently (accurately!) represent years much larger than 280.000 with no issues whatsoever. It’d be a shame to throw that away, not to mention that the current typemax(DateTime) already supports 146 million years (though, as the issue I linked shows, we could comfortably support another 146MY without issues):

julia> typemax(DateTime)
146138512-12-31T23:59:59

Lowering that by this extent seems hugely breaking.

stevengj · August 21, 2023, 2:18pm

Why? Who cares about throwing away functionality no one could possibly need (millisecond precision over million-year intervals) in favor of functionality that lots of people clearly need (sub-ms precision over short intervals)?

And if you are representing such long intervals in your code today, I would say that Julia is extremely dangerous — if you need 500 million years today, it’s quite likely that you might need a billion years tomorrow, and then you will catastrophically overflow Int64, whereas Float64 upscales very gracefully?

tim.holy · August 21, 2023, 2:19pm

ou pulled yourself into this DateTime discussion

Definitely, I did and have made all my own choices. Apologies if my comment implied otherwise. I’m just now trying to limit my time commitment to this issue since I don’t have much skin in the game.

using floating-point values instead of using 64-bit integers

Thanks for the link. Indeed I’m not sure we shouldn’t be using using floats. If we switched to Float64, our precision would be about 7us, better than we have now; it gets worse than we have now in about 100k years. But we would lose associativity. From the fact that other libraries like Rust’s also use integers, I worry that’s more of a problem than I currently appreciate. But my search skills have not turned up any horror stories.

adienes · August 21, 2023, 2:21pm

if timestamps cannot be treated as integral that precludes them from quite a few applications

however, I think it’s very unlikely that anyone with those applications is using DateTime anyway, due to the fact that it can’t repr micros and doesn’t use unix time

stevengj · August 21, 2023, 2:23pm

The precision would be much better for short intervals. You would have 15 significant digits of relative precision regardless of the time interval — in what real application can you measure time intervals with more than 15 significant digits?

And for all applications measured in milliseconds that involve intervals < 280,000 years, i.e. virtually everything people are doing now with Dates.jl, the behavior would be identical to the current implementation.

adienes · August 21, 2023, 2:24pm

I feel quite sure that anybody who needs this much precision is not willing to rely on floating-point rounding behavior to get it

Sukera · August 21, 2023, 2:25pm

I care, because I strongly believe that we shouldn’t break things just because some people think “you couldn’t possibly actually need this, right?”.

Seeing as noone today can actually use that with Dates’ DateTime due to the fact that we’re truncating today, I strongly suspect not a lot of people actually need micro- or nanosecond precision in their DateTime type (in fact, people have mentioned in this thread that they don’t need it) or their code is already buggy, broken & not doing what they expect it to do. Both NanoDates.jl and TimesDates.jl (which support nanosecond precision at the cost of being 128 bits in size) have 0 dependent packages. Hence, I don’t see why we should support this in a general purpose standard library.

tim.holy · August 21, 2023, 2:34pm

I meant

julia> eps(Float64(now().instant.periods.value))
0.0078125

This is because we use Gregorian time, not unix-time. So it’s about 2 orders of magnitude greater precision. That’s not nothing, but it’s not a huge increase. (Indeed it would be much higher if we talk about short intervals.)

stevengj · August 21, 2023, 2:34pm

That’s a generic argument for keeping any behavior indefinitely. But if you can’t make a reasoned argument for the current behavior — if you can’t give any plausible example in which someone needs to measure million-year intervals with millisecond accuracy — I don’t think it is a particularly good argument.

Really? Nearly of computational science would disagree.

Please be specific. Where would storing milliseconds as Float64, vs. milliseconds as Int64, be a problem?

Palli · August 21, 2023, 2:34pm

@Oscar_Smith

brought up an argument for why it shouldn’t error - namely, imagine using DateTime and its arithmetic as the time variable in an ODE.

DateTime shouldn’t be used for time intervals (UTM though is there for that); nor then I think ODEs.

DateTime is for ISO 8601 timestamps, as documented; I guess for 8601-1 (going forward, not yet documented as such I believe), as opposed to 8601-2:

https://www.isotc154.org/posts/2019-08-27-introduction-to-the-new-8601/

ISO 8601 is now a family of standards:

ISO 8601-1:2019 is the direct successor to ISO 8601:2004

ISO 8601-2:2019 provides extensions on top of ISO 8601-1:2019

For UX reasons, consider an ODE (e.g. a pandemic model) where the time variable is represented as a datetime. Suppose the ODE solver takes a time step of 1 day, 2 minutes and 300 microseconds. Should that throw an error or just round to 1 day 2 minutes?

I’m still sort of curious, for a pandemic model, wouldn’t “1 day 2 minutes” be enough, even just “1 day”, or if not, you can go down to millisecond durations (with UTM).

You most often construct DateTime from text, then no problem, no possibility of rounding (currently), you can construct down to the millisecond the representation supports, and also down microseconds (non-default option); it’s just impossible down to nanoseconds, without changing the representation.

@JeffreySarnoff eps for DateTime “is always Millisecond(1)” because why? Would you go with changing it to Microsecond(1), since it’s possible? Most users of DateTime are just storing timestamps and not caring too much, might never do any calculations on them. I’m not even sure what ISO 8601 says about that, since we get wrong durations out:

julia> DateTime("2015-10-01T00:00:01.000") - DateTime("2015-09-30T23:59:59.000")  # ok
2000 milliseconds

julia> DateTime("2015-07-01T00:00:01.000") - DateTime("2015-06-30T23:59:59.000")
2000 milliseconds

That seem to be the same durations, 2 sec, but the latter is wrong since June 30, 2015 had a leap-second added, and thus the latter duration is 33% off. ISO 8601 doesn’t have that concept; since it’s meant for time-stamps, and nobody cares then.

For some applications we care about durations, and/or DateTime timestamp accuracy, but for most it seems silly to worry about if eps is in milliseconds or microseconds, when durations can be off by a sec, or more, since they add up. Also while your clock may be synced up to milliseconds with NTP, it may not be (and off my a lot), and if it is, is it actually accurate to microseconds?

From 2035, leap seconds will be abandoned for 100 years or so and will probably never return. It’s time to work out exactly what to do with a problem that has become increasingly urgent, and severe, with the rise of the digital world.

Why do we have leap seconds?

It’s important to be able to do what you state, including most importantly order them, but were they and ISO 8601 it’s based on ever meant to support anything more, such as offsetting?

Why, do you have an appointment after year 9999? While I can construct higher (and probably shouldn’t be able to) up to typemax, I can’t construct typemin (nor even just negative years):

julia> DateTime("-146138511-01-01T00:00:00")  # We will never be able to construct the supposed Big Bang (I no longer believe in), with the current Int64.
ERROR: ArgumentError: Invalid DateTime string

[ISO 8601 was limited to the Gregorian calendar, i.e. 1582, no longer is, using “proleptic”, or allows. This should be the min we allow: DateTime("100-01-01T00:00:00") opening up YY possibility to mean 20YY later.]

julia> DateTime("-146138511-01-01T00:00:00")
ERROR: ArgumentError: Invalid DateTime string

adienes · August 21, 2023, 2:37pm

when building trading systems, sequencing messages correctly is very important. and this includes sequences with respect to simulated fixed offsets of say a few micros

Sukera · August 21, 2023, 2:37pm

I don’t need to find someone who uses both millisecond precision and million year precision at the same time, I just need to find someone who stores something with more than 280.000 years in a DateTime. Whether they need the millisecond precision is irrelevant - by limiting the years you’re potentially breaking their code, that may expect to work with the existing assumptions of the Dates standard library (in this case, “typemax doesn’t get smaller”).

I’m strongly opposed to any argument of convenience that results in a huge difference in behavior between Julias standard DateTime and other standard libraries version of DateTime.

stevengj · August 21, 2023, 2:40pm

Yes, the argument is much stronger for using floating point for Period than DateTime — a good objection against using floating point for DateTime is that then the precision would be date-dependent, not interval dependent.

The alternative would be store store a DateTime as an Int64 millisecond value (the current representation) plus a Float64 offset, while using Float64 for Period. This way you could have date-independent precision and still more graceful rounding behavior for sub-ms intervals.

Sukera · August 21, 2023, 2:44pm

To be clear, I’m not concerned with how a Period is represented or used. That can be done as a floating point or fixed point or integer or whatever. I don’t think it’s necessary to use Float64 for that because we already have CompoundPeriod, which doesn’t have any precision issues because it can hold any Period. The issue is just - how should DateTime (which is a timestamp, not a duration) interact with a Period, seeing as DateTime (today at least) can only fit so much precision (Millisecond, at the moment)? My argument is that if you DO need higher precision, DateTime should error, to alert you to use a type that represents a more accurate timestamp (e.g. TimesDates.jl or NanoDates.jl), and not silently swallow (or worse, round) the explicitly wanted precision.

The alternative is of course to go full support and increase DateTime in size to 128 bits, to represent Nanosecond as well (without losing precision in years). That would be well paired with

but presuming we don’t want to increase precision for performance reasons (as some people brought up in the PR that spurred this discussion), we don’t really have a lot of options left.

Sukera · August 21, 2023, 3:08pm

I agree - DateTime is a timestamp, and due to the difficulties we as programmers have in dealing with time, we shouldn’t assume more from them than they are. It’s not easily possible to construct a range of DateTime and have the results be half decent - there are too many external (i.e. human) factors at play here to look at them in isolation and decide the semantics based on that look.

anowacki · August 21, 2023, 3:43pm

This seems like a good approach to me. I would be very interested to see how using this as a base panned out. Of course the fractional seconds could be parameterised (default Float64) to allow the precision in s to vary whilst the dynamic range of the date to be kept.

By the way, I think it’s great that the fundamental design of Dates is getting some discussion, as with improvements we could have a really nice, performant and useful stdlib implementation.

Sukera · August 21, 2023, 3:49pm

To keep the existing code working as well as possible, with minimal refactor, we could also only store fractional milliseconds as a Float64. That would allow MUCH higher precision, as the fractional part would always be denormal (i.e. with a leading zero). Of course, that also means increasing the bitsize to 128 bits. IBM is doing something similar in their SQL implementations, but based on UNIX time (hence the seconds). We’re not UNIX based, so choosing fractional milliseconds over fractional seconds seems preferrable to me, in part to keep the existing huge range of supported years the same (thus keeping this change completely nonbreaking).

To be clear - I’m not against increasing the bitsize of DateTime. I’m only against keeping DateTime the same as it is now, and having +(::DateTime, ::Nanosecond) silently truncate/round/lose data.

I should also note that I have yet to find any timestamp library other than TimesDates.jl and NanoDates.jl that actually support timestamps with nanosecond resolution. Most databases seem to stop at either seconds or milliseconds, and all I’ve seen so far don’t have support for smaller-than-supported-precision arithmetic. That is, no code/library I’ve found over the past few days looking into this has anything close to +(::DateTime, ::Nanosecond).

mihalybaci · August 21, 2023, 3:50pm

I don’t have a great grasp of the lower level fundamentals here, but one option could be to implement option 2 from the OP only for higher precision. Essentially, keep the 64 bit representation up to millisecond precision, then if a user tries date + Nanosecond(1) it would work but provide a warning such as Warning: DateTime precisions beyond 1 millisecond require extra memory. Or some such.

This has the advantage of not breaking any current code based on Milliseconds, and other suggestions mentioned in the thread could be added as features/bug fixes to Nanoseconds (e.g. nanoseconds could be represented as floating point). Going from Milliseconds to Nansconds would analogous to going from Int to BigInt. I’m sure I’m oversimplifying this, but it seems like a reasonable option to me.

Sukera · August 21, 2023, 3:52pm

That is essentially what option 3 is - a deprecation warning for unsupported/truncated results. Option 2 is explicitly about increasing the bitsize, which option 3 does not.

mihalybaci · August 21, 2023, 4:00pm

Yeah, it seems like a hybrid of 2 and 3 would be be best way to increase correctness/precision without breaking things (at least from my limited understanding).

Topic		Replies	Views
TimesDates.jl dev for v2 Specific Domains dates , time , internals	44	2224	September 14, 2023
Higher resolution DateTime/TimeStamp Internals & Design dates	44	10214	February 13, 2019
Why do time quantities have to be integers? Internals & Design question	56	4980	February 5, 2018
Differences smaller than Dates.Milliseconds are ignored Internals & Design question , bug , dates	7	720	July 6, 2019
Universal Time vs UTC Time in Dates General Usage dates	11	1533	August 26, 2023

`DateTime` arithmetic on `Microsecond` (or smaller) scale

Related topics