`DateTime` arithmetic on `Microsecond` (or smaller) scale

Please also have a look at the recommended time code formats of the CCSDS, especially section 3.3 “DAY SEGMENTED TIME CODE (CDS)”:

A high precision binary code would include the millisecond accurate code plus an extra field. The day segmentation allows to gracefully represent time stamps within leap seconds (we still compute things like the nr of seconds between two time stamps assuming each day has 86400 seconds, but it is possible to represent and compare time codes also within leap seconds).

3 Likes

That’s a very interesting link - as Dates is mostly concerned with human interaction (we already check a lot of things in the constructors), I’d personally prefer a CCS approach. About half of that standard also seems concerned with the interchange of data - a problem Dates doesn’t tackle at all, so I think that would be better suited to a community package (which could then convert DateTime/a hypothetical HighPrecisionDateTime to its internal wireformat for transmission).

Regarding the original topic, I don’t have a clear opinion on what’s the “correct” behavior for the DateTime + Nanosecond operation; I think there are some good points on both sides, but it’s hard (for me) to reason about the potential use cases where this situation might occur or about the relative primacy of different abstract principles. I do think, nevertheless, that we should be clear about the intended use cases for the stdlib DateTime versus the use cases where a custom format (see TimesDates.jl, AstroTime.jl) might be better suited, since the desired format for each user is different. There is no hypothetical standard DateTime object that is optimal for everyone.

For example, there are several suggestions in this thread to basically double the bit size of the DateTime format so that we gain extra precision and the original concerns are moot. That would not be a problem for most of my use cases, but I worry that we would be increasing the memory footprint and reducing the performance of “data science” use cases, where you might have one or several DateTime fields over millions of rows. In fact, I think those use cases where you don’t have high precision requirements, but you want a baseline DateTime type to facilitate interoperability between table/dataframe packages and analysis packages, should be prioritized in the standard library.

As for “high-precision” (or “scientific”) users in the other extreme, namely space, astronomy and geophysics, it seems to me that it’s definitely better to rely on external packages that provide the domain-specific optimized format. This is not just because of number format issues, but also because at those precision regimes there are far more issues than just numeric precision. There are different time scales that advance at different rates (UT, TAI, TDB…) and the conversions between them are not trivial: either they require external data files (like the IERS tables) or a computationally heavy operation that might be truncated to different degrees according to performance/precision preferences. Thus, I think it would not be a good idea to burden the standard library with all those domain-specific choices and concerns, particularly when they might be limiting in other aspects (e.g. ability to represent dates in the past). In that sense, I would be cautious about domain-specific standards as well, unless they work
well for other contexts.

There might be some “intermediate precision” use cases as well, requiring e.g. precision at the nano-second level, where the current type falls short. Perhaps it’s a system reading a sensor at high frequency, or an oscilloscope, or something similar. Maybe it’s worth to define a stdlib HighPrecisionDateTime for that case instead of an external type, I don’t have a strong opinion. It’s still important to highlight here that there might be “externally-sourced” issues if you want to do very precise arithmetic with DateTimes (defined in Universal Time) and real-time periods (which are only “correct” when interpreted as a difference of TAI epochs, not as a difference of UT epochs), or when handling different clock references.

Finally, regarding high-frequency trading, I’m not sure what are their requirements: surely you want accuracy, but also optimal performance? Wouldn’t that require another custom time format where you sacrifice the range of representable DateTimes instead of either?

5 Likes

I’m not sure what would be best. For example, it is hard to imagine why these two should give different results:

dt = DateTime(2012)
dt+ Nanosecond(1e9)
foldl(+,(Nanosecond(1) for _ in 1:1e9),init=dt)

So maybe that is in favor of just disallowing it altogether? This is related to the comment about commutativity I suppose.

2 Likes

Dates and times are typically treated as continuous variables in data analysis, so floating point would be more natural here instead of integers.

@tim.holy’s point above is well taken, though — for a DateTime (i.e. a timestamp relative to some fixed epoch, not a time interval), you want uniform precision over the whole range of supported dates (e.g. if you have nanosecond precision for 2023 dates, you also want nanosecond precision for dates in BCE 753). That inherently argues for fixed-point arithmetic, not floating-point. This is what we have now; to gain more precision we would have to widen the type (e.g. to Int128).

Time intervals (Period in Julia) are a different matter — for that application, uniform relative precision (i.e. less absolute precision for longer intervals) makes sense, and floating-point arithmetic would be more sensible.

9 Likes

Another option for default DateTime is to have the default be 64 bit seconds and a separate 32 bit nanoseconds, this gets you a memory savings of 4 bytes per value relative to Int128, but doesn’t get you the usually excessive attosecond precision. Nanoseconds are a thing people are using today though. You might want to do something like send 20 packets on the wire and wait for replies and measure the durations and jitter to tens of nanoseconds jitter time or something.

Not with struct padding (since Int64 likes to live on 8-byte boundaries):

julia> struct Foo
          seconds::Int64
          ns::Int32
       end

julia> sizeof(Foo(3,4))
16

julia> sizeof([Foo(1,2), Foo(3,4)])
32
2 Likes

Gotcha, hadn’t thought of that. So might as well go Int128 and attoseconds.

This does not change the dilemma we have (even if the concept of a leap second had never been invented, for any standard):

From 2035, leap seconds will be abandoned for 100 years or so and will probably never return.

It’s possible to estimate the position of the Sun one millennium or century ago, but the position is not in [Date]Time units, but meters. If you want the position for say a six months ago, then you can subtract the required number of seconds, but if you assume the integer number of seconds (or microseconds possible in the representation), then you will be off by a tiny bit. If you have a leap-second in the meantime, or even if there was none. Over longer stretches of time, if you subtract, even in integer number of nanoseconds, then you may get the wrong day, and the error in days could stretch into multiple days.

Yes, who cares about the exact day back then? But it would change the position of the sun, e.g. whether it’s winter or summer.

I may be exaggerating a bit for effect, but this will be the case in milliseconds, or microseconds, or even nanoseconds, it just changes how far back (or forward) in time you have to go.

And I’m not yet taking into account how the rotation speed of the earth around the sun is not constant. Just that it’s not an integer (or rational) multiple of anything.

The first use case I can think of where someone might want better precision is the very specific subfield of archaeoastronomy. Examples: trying to find the sky location of a possible supernovae reported in 500 A.D would require back propagation of both the Earth’s axial precession and proper motions of stars and other night sky objects. The same is true if one wants to find out the reason behind the alignment of Stonehenge or other ancient sites that may have a celestial origin.

I’m not an archaeoastronomer, but it seems to be a good assumption that no one is going to run a simulation backwards 7000 years with attosecond precision. And perhaps people would rather use a pure floating number (Julian dates). But it seems like those should be user choices, not constraints imposed by the package.

Should Dates be able to handle every use case? Or should people simply use other packages? My examples seem perfect for AstroTime. :person_shrugging:

1 Like

Well, I can well imagine such a simulation as a PhD topic. Actually a great topic: Put your predictions a couple decades in future (climatology) or a couple of centuries in the past, and you can simulate pretty anything
/ irony off

To be serious: I am not sure whether orbits of celestial bodies were not chaotic to an extent even under assumption of rigid bodies in the absence of external perturbations. Which they are not, and as for external perturbations: our Sun is not exactly a source of constant radiation.

Sorry for offtopic :slightly_smiling_face:

1 Like

Here is one case I remember specifically from this “forensic astronomer”, who estimated down to the minute when Monet was painting a sunrise in 1835. (I tried to find a link that would work for everyone, apologies if it doesn’t). And things are surely chaotic to a small degree, but it is still possible to estimate the dates/times when, say, the Sun would line up just right at Stonehenge in 4-5000 BC.

Surely astronomy belongs to the so-called exact sciences, which are called so not because they “admit of absolute precision in their results”, as Wikipedia maintains, but, by the definition I like most, because they make it possible to estimate the reliability of their results.

It is probably possible to estimate the Sun position on a certain date a few millenia ago (but probably impossible to verify the results). But it is probably not possible to tell with the resolution of a minute the duration between the noon January 1st 5000 b.c., and January 1st, 1970 at 00:00:00 UTC.

As for Monet painting - that was mainly meteorological records anyway.

3 posts were merged into an existing topic: Target topic for a merge fixup

Maybe I missed that part of the discussion, because it’s such an obvious suggestion but… why not make DateTime parametric? Then you could keep the current behaviour as is by default, while supporting extra precision with eg. DateTime{Int128}. On our side, doubling the memory requirement of DateTime would be a bit of a bummer…

Sir Isaac Newton used some of these techniques to date ancient civilizations, once navigation by stars was invented. Not sure if this is a use case for some of these packages. I did not personally try to reproduce the math. If you find problems with the math I can bring it to the editor’s attention.

I wonder if that would be considered a breaking change. Maybe not technically breaking, but it could slow down code that assumes DateTime is a concrete type. For example, this struct would go from having concretely typed fields to having abstractly typed fields:

struct A
    t::DateTime
end
1 Like

I’ll toss in my 2 cents here. One of the threads throughout this conversation has been (both explicitly and implicitly at various points) that maybe more precise DateTimes, wider epochs, etc. should be done in a package for users that need that functionality. I don’t disagree and AstroTime.jl and NanoDates.jl are great packages.

Unfortunately, the structure of Base.Dates makes the implementation of a custom Dates.AbstractDateTime subtype very complicated. In particular one common use case many users may want is the ability to parse a string. Base.Dates provides a function for this but it is tailored quite narrowly to only the type in the base package. Extending it leads to various difficult tradeoffs. For more details check the discussions under closed issues in NanoDates.jl.

This means that, unlike other Base types such as AbstractArray, AbstractDict, etc. one cannot quickly and easily create a custom concrete type to match ones needs and/or swap to an alternative implementation when one’s code runs into the limitations of Base.Dates.

So all-in-all a conversation about what Base.Dates should or shouldn’t support involves leaving some functionality up to packages but that functionality is difficult to realize. Therefore either Dates should expand to cover more use cases or be improved to make the implementation of alternate DateTime types plausible.

*edit: fixed a typo

1 Like

Or make a new struct:

PreciseDateTime{T}

Which is parametric but leave the existing DateTime alone… But make adding a Nanosecond(1) to a DateTime promote to PreciseDateTime{Int128} by default?

1 Like