`DateTime` arithmetic on `Microsecond` (or smaller) scale

I really do think that people looking for a specific use case should use a package built for that specific use case, instead of lumping all that complexity into a standard library. In this case, there even is a package for exactly that usecase: AstroTime.jl. According to JuliaHub, they have dependents too! Funnily enough, their docs warn about converting to DateTime:

Please note that the time scale information will be lost in the process.

The prevailing notion I’ve heard about standard libraries so far is that they should be kept small & simple - why should Dates be different?

If we use 128 bits I’d propose we represent time as an Int128 in attoseconds. 127 bits gives us 38 log10 orders of magnitude to work with. Using 18 of those for sub-second behavior would leave 20 orders of magnitude for seconds and beyond, enough to get up to 240 billion years without overflowing. That seems like sufficient range to me.

5 Likes

This seems to work in Rust:

use time::{Date, Duration, Instant};

fn main() {
    let dt = Instant::now();
    let new_dt = dt + Duration::nanoseconds(1);
    println!("{:?}", dt);
    println!("{:?}", new_dt);
}

# Instant(Instant { tv_sec: 1882501, tv_nsec: 575241267 })
# Instant(Instant { tv_sec: 1882501, tv_nsec: 575241268 })

Haskell UTCTime contains a DiffTime that goes to picoseconds.

Python’s pandas.Timestamp has nanoseconds and can do pd.Timestamp.now() + pd.Timedelta(nanoseconds=2).

(This is just what I found from looking around. I don’t have the knowledge to answer any follow-up questions.)

I propose though that DateTime("10000-01-01T00:00:00") (or any higher) should fail by default (typemax, and typemin, would still be the same, at least for now(?)), while we could allow DateTime(10000) to still work (and maybe keyword argument for the former? or not bother), which gets you the exact same, i.e. down to the millisecond, while only showing in microseconds.

C++ has all kinds of (chrono) types, some always integers, so always double (Float64); some a mix (template), i.e. of rationals and: std::chrono::duration - cppreference.com

If Rep is floating point, then the duration can represent fractions of ticks. […]

Member type Definition
rep Rep, an arithmetic type representing the number of ticks
period Period (until C++17)typename Period::type (since C++17), a std::ratio representing the tick period (i.e. the number of second’s fractions per tick)

operator++
operator++(int)

operator== (C++11)
operator!= (C++11)(removed in C++20)
operator< (C++11)
[…]
operator<=> (C++20)

[Why removing inequality test (but not others)?]

std::chrono::nanoseconds  duration</*signed integer type of at least 64 bits*/, std::nano>
std::chrono::microseconds duration</*signed integer type of at least 55 bits*/, std::micro>
..
std::chrono::milliseconds duration</*signed integer type of at least 45 bits*/, std::milli>
..
std::chrono::days (since C++20) duration</*signed integer type of at least 25 bits*/, std::ratio<86400>>
..
std::chrono::years (since C++20) duration</*signed integer type of at least 17 bits*/, std::ratio<31556952>>

std::chrono::duration not to be confused with std::difftime - cppreference.com which is always a double (Float64).

https://www.isotc154.org/posts/2019-08-27-introduction-to-the-new-8601/

Here’s a short history of ISO 8601.

Predecessors:

ISO 2014:1976 (all-numeric dates)
ISO 2015:1976 (week numbering)
ISO 2711:1973 (ordinal date numbering)
ISO 3307:1975 (representations of time of the day)
ISO 4031:1978 (time differentials)

These standards were all superseded by the first ISO 8601, ISO 8601:1988.

I was seemingly wrong on “time differentials” not in the standard (in ISO 8601, which we support, I’m not clear on what C++ is trying to support). We don’t specifically state which part of ISO 8601 we do support, or do NOT support, but even if we do not support that (or some specific) part, or do incorrectly, then I suppose we need to support, and change if needed to, what the standard says on it.

Do we support them correctly (through compatibility with the old ISO 4031:1978, we never supported most of above, e.g. I don’t think currently “week numbering”, but likely should). Note, latest ISO 8601 has been 80% rewritten, so I’m not sure what it says on it (currently).

[I found it somewhat intriguing that Julia deprecates Libc.TimeVal, but then uses it for `now()`.]

We seemingly need equivalent of timespec_get (std::timespec - cppreference.com (both “since C++17”))

Possible output:

Current time: 06/24/16 20:07:42.949494132 UTC

std::time_t tv_sec whole seconds – >= 0 [Is that in error? does it allow unsigned as it used to, or now unsigned since 64-bit?]
long tv_nsec nanoseconds – [0, 999999999]

Current time: 04/06/23 12:03:31 (UTC)
Raw timespec.time_t: 1680782611
Raw timespec.tv_nsec: 678437213

so it’s a struct of, the latter, long, i.e. at least 32 bits (and in practice can be long long), and the former an unspecified type, but usually 64-bit integer, so 128 bits.

timespec_get (C++17)

returns the calendar time in seconds and nanoseconds based on a given time base (function)

[You can have only 584 years of nanosecond resolution in Int64. Since other languages have that resolution, and that seems useful, sometimes, why isn’t time limited to that many years? Already we can’t reach big bang, and if I were doing this I would choose that or from 3000 BC to 2840 in resolution of 10s of nanoseconds? I’m also partial to a rational form, 63 bits for the nominator in nanoseconds, and to reach big-bang, and far into the future, if the last bit is 1, then the the denominator is huge, making the resolution in 10s of seconds. Who cares about the extremes? Or just go to Float32?]

Certainly oddities (that I bolded): ISO week date - Wikipedia

An ISO week-numbering year (also called ISO year informally) has 52 or 53 full weeks. That is 364 or 371 days instead of the usual 365 or 366 days. […] The extra week is sometimes referred to as a leap week, although ISO 8601 does not use this term.

And mostly more trivia below:

std::chrono::treat_as_floating_point (since C++11)

Data Type: time_t
time_t is the simplest data type used to represent simple calendar time.

In ISO C, time_t can be either an integer or a floating-point type, and the meaning of time_t values is not specified. The only things a strictly conforming program can do with time_t values are: pass them to difftime to get the elapsed time between two simple calendar times (see Calculating Elapsed Time), and pass them to the functions that convert them to broken-down time (see Broken-down Time).

On POSIX-conformant systems, time_t is an integer type and its values represent the number of seconds elapsed since the epoch, which is 00:00:00 on January 1, 1970, Coordinated Universal Time.

The GNU C Library additionally guarantees that time_t is a signed type, and that all of its functions operate correctly on negative time_t values, which are interpreted as times before the epoch.Even though time_t is usually not a float, there is:

double difftime( std::time_t time_end, std::time_t time_beg );
Computes difference between two calendar times as std::time_t objects (time_end - time_beg) in seconds. If time_end refers to time point before time_beg then the result is negative.

std::clock time may advance faster or slower than the wall clock, depending on the execution resources given to the program by the operating system. For example, if the CPU is shared by other processes, std::clock time may advance slower than wall clock. On the other hand, if the current process is multithreaded and more than one execution core is available, std::clock time may advance faster than wall clock. […]

On POSIX-compatible systems, clock_gettime with clock id CLOCK_PROCESS_CPUTIME_ID offers better resolution.

The value returned by clock() may wrap around on some non-conforming implementations.

clock_t is used to measure processor and CPU time. It may be an integer or a floating-point type. Its values are counts of clock ticks since some arbitrary event in the past. The number of clock ticks per second is system-specific. See Processor And CPU Time, for further detail.

https://en.cppreference.com/w/cpp/numeric/ratio/ratio

quecto (C++26) std::ratio<1, 1000000000000000000000000000000> (10-30), if std::intmax_t can represent the denominator
ronto (C++26) std::ratio<1, 1000000000000000000000000000> (10-27), if std::intmax_t can represent the denominator
yocto std::ratio<1, 1000000000000000000000000> (10-24), if std::intmax_t can represent the denominator
ronna (C++26) std::ratio<1000000000000000000000000000, 1> (1027), if std::intmax_t can represent the numerator
quetta (C++26) std::ratio<1000000000000000000000000000000, 1> (1030), if std::intmax_t can represent the numerator

Right - both of those can accurately represent nanosecond scales. I should have written that I haven’t found a library having coarser than nanosecond precision in its DateTime equivalent, while also allowing to add Nanoseconds.

1 Like

Yes, definitely, as long as we then don’t add a Zeptosecond type to Dates, spurring the whole discussion again :joy: That should also be enough support for high precision clocks in an OS for the foreseeable future, so I’m on board with that :+1: Encoding wise, it should be possible to simply extend the existing encoding scheme of the constructor:

The only worry I have is that the numbers involved here will likely be quite large, since years are encoded in the most significant bits - as the vast majority of applications really don’t require that much precision at all, this is going to be quite wasteful and force Int128 calculations everywhere… Performance minded folk (though I’m not sure how well they’re served today) are not going to like that increase.

I really think a 128 bit int for DateTime makes sense and I like Tim’s suggestion

I’d argue that people doing differential equations etc should be working in dimensionless form with Float64, and then if trying to communicate a particular time, converting their output float to a DateTime.

Performance is not the issue when using DateTime, consistency and accuracy etc are. A typical use would be for things like time-stamping events or calculating interest payments.

The universe is only thought to be on the order of 20B years old, and an attosecond is 0.001 times the cycle duration of a blue light wave at 300nm

The people who need stuff outside that dynamic range should be doing something custom.

2 Likes

Just for fun, some random notes on time accuracy in some applications:

Seismic exploration: GPS time accuracy in current acquisition systems is in the range 1-40 microseconds, while the GPS satellite timing signals themselves are typically accurate to ~10 nanoseconds.

Finance: Nasdaq hopes to synchronize a vast network of computers with nanosecond precision, to accurately order the millions of stock trades that are placed on their computer systems every second.

Quantum mechanics: NIST’s strontium atomic clock keeps time to < 67 picosecond per year, offering a route to reveal how relativity and gravity interact with quantum mechanics. A new quantum mechanical time-measuring device is capable of measuring durations as short as 81 picoseconds with an error margin < 8 femtoseconds.

1 Like

Some day when I am king I hope I can impose a call market with a 10 second tick thereby eliminating all high frequency trading

2 Likes

I’m disappointed that the PR Dates: Documentation, tests, and changed arithmetics for `DateTime` by barucden · Pull Request #50816 · JuliaLang/julia · GitHub has now just been merged, after we’ve seemingly reached a consensus here that increasing precision is better than rounding. What good was this discussion then?

That was just a small bugfix switching from truncation to rounding. Nothing about that blocks a more ambitious solution to switch to 128bits. (I’ll note that the suggestion to switch to Int128 was made on GitHub long before this discussion got started.)

1 Like

And in merging it, you’ve introduced a new bug, because now you can get a DateTime object LARGER than what you requested! That is not at all okay for timestamp arithmetic, that’s worse than truncation!

Just imagine doing x[5.7] and getting an out of bounds error because x only has 5 elements, not 6. That’s obviously absurd, right?

All in all, I’m just extremely disappointed in how this was handled. The PR got marked for triage discussion 3 hours before triage happened that day. That’s not at all enough time to see if people can make the time for triage.

If you’ve never done this before, here’s a useful exercise:

x, y = 0, 0.0
for i = 1:n
    r = rand()
    x += op(Int, r)        # you'll compare `op=trunc` vs `op=round`
    y += r
end

Now plot abs(x - y) vs n, and compare trunc vs round (answer: the error with trunc scales as O(n), whereas the error with round scales as O(sqrt(n)) without substantially changing the numeric prefactor, an unequivocal win). Switching to rounding should be completely non-controversial; this discussion can stand as a reference for the bigger design issues.

Part of my work is in this space. I have struggled with the lack of nanosecond precision tooling built-in to modern languages (not just Julia). I would welcome the expansion to support it.

As for truncation versus rounding, it is somewhat important to people in this space. However, this debate only concerns what happens when the timestamp runs out of precision. If we extend the timestamp to something even deeper than nanosecond precision, then it becomes a moot point.

3 Likes

As an expert in that domain, what do you feel should be done until/if that additional precision is available?

Time stamped data can be very large, and the time stamps are often used as the index. Being able to load the index into memory usually has a large performance advantage. Example:

I have data from an instrument on a satellite which has been nearly 10 years in orbit. The measurements have been twice per second, nearly 100 % of the 10 years. The time stamps are 16+32 bit integers, date and milliseconds of day. Loading the index of the whole data lot (i.e. the timestamps) requires about 4 GB of memory. My laptop can handle this, probably a few more years. With 128 bit time stamps it might crash.

The great majority of applications are fine with millisecond accuracy. Timestamps are typically generated automatically by computer clocks, in Julia with the now() function. Computer (incl. mobile phone, etc.) clocks are synchronized with NTP, the relative accuracy is typically a few milliseconds. So timestamps created at different computers are not reliably comparable to each other at milliseconds accuracy or higher.

Of course there are applications where more accurate timing is available and needed by special hardware (GPS modules etc.). A computing system like Julia could offer here software support via high precision time stamps. But I would suggest that this is not the default, rather an opt-in, as the price for it in terms of efficiency is significant.

5 Likes

To be honest, not sure it matters. DateTime is pretty much currently unusable in the area when nanosecond precision is required. Although I have not tried the newest NanoDates.jl, to be fair. What I did was to split to Date and Time fields in my data.

2 Likes

As an expert in this domain, do you feel the default DateTime type (which would presumably retain Millisecond accuracy) should allow arithmetic with higher precision types, such as Microsecond or Nanosecond? Should it guide users to a type supporting that properly instead, or do something entirely different?

Would you prefer it to alert a user to the issue, in case they do try it without knowing that it won’t be accurate?

This is a very valid point.

Is there anything that prevents us from introducing PrecisionDateTime in the DateTime library, use the 128 bit attosecond suggestion for that, and change the DateTime to something like 64 bits for milliseconds since an epoch?

Also if you add nanoseconds to a DateTime you could get a PrecisionDateTime ? or no?

6 Likes