HFT apps are certainly not using DateTime
, I’m not sure it’s an appropriate use case to even pretend to support with the current design.
I don’t know if I missed it, but has anyone suggested DateTime{T}
?
Yes that has been suggested (multiple times), no that is not an option; it makes existing code like
Vector{DateTime}
or
struct Foo
dt::DateTime
end
type unstable. Not to mention that it’d be just as much work as adding a HighPrecisionDateTime
without those problems.
I suggested that on GitHub, where @bvdmitri noted the same issue as @CameronBieganek. A solution would be to define
const DateTime = ParametricDateTime{Int64}
Not only could one use this for the precision issue, but one could also use ParametricDateTime{SafeInt64}
to perform checked arithmetic, if one were worried about overflow.
You’d have to pull that change through all of the existing code, modify UTInstant
too as well as add a CheckedInt
to Base. That seems much more work than just adding a distinct new type.
Yes, if all we’re going to do is add PrecisionDateTime
then I agree we probably don’t need to make it parametric; that will, however, force a singular choice one way or another about checked arithmetic.
On that note, is there anyone here with the expertise to weigh in on whether the performance of DateTime
arithmetic matters? This uncertainty is holding up a decision of whether we should enforce checked arithmetic when adding or subtracting periods from a DateTime
. Specifically, in my tests switching to checked arithmetic is about 2x slower for “normal” code, and 5.5x (AVX2) or 11x (AVX512) slower if you’re using SIMD vectorization. (Interestingly, Int128
is similar to checked, and checked Int128
is another 2x on top of all this.) But this matters only if DateTime
arithmetic is the performance bottleneck. I have no idea whether that ever happens in practice.
Could there be an arithmetic equivalent of @inbounds
that users can apply to a SafeInteger add?
It’d be interesting to see how you tested this - I’d imagine adding the same offset to a vector of DateTime
is less common, and I’m unsure how common adding distinct offsets to the same DateTime
is
Just a simple loop-based manual implementation of sum
for lists. Not intended to mimic a specific DateTime workload.
Again, segmented time encoding (some additional reading) would address some of the issues mentioned in the previous posts. Such a segmented time code might look as follows:
struct DsTimeCode # day segmented time code, medium resolution
# (can represent leap second times)
day2000::Int32 # day2000=0 is 2000-01-01
hμsec::UInt32 # hectomicro (10^-4) seconds of day
end
struct HrTimeCode # High resolution time code
dstimecode::DsTimeCode # day segmented time code, medium resolution
hfsec::UInt64 # dekayokto (10^-23) seconds of dstimecode
end
A simpler version, wasting a few bits, is
struct DsTimeCode # day segmented time code, medium resolution
# (can represnt leap second times)
day2000::Int32 # day2000=0 is 2000-01-01
msec::Int32 # milliseconds of day
end
struct HrTimeCode # High resolution time code
dstimecode::DsTimeCode # day segmented time code, medium resolution
zsec::Int64 # zepto (10^-21) seconds of dstimecode
end
I have been using something like this for some time, but did not publish as a package and have not timed extensively., sorry.
I was part of this discussion on Slack. I’m an astronomer and worked in the Time Services Department, now the Precise Time Department, at the US Naval Observatory. They maintain the Master Clock (*). The astronomical time standard is Julian days, which is usually split between integer days and floating point fractional days. Seconds are also used and also split between integer seconds and floating point fractional seconds. Nanosecond time resolution is common in astronomy, mainly because of pulsar timing research. Datasets exists with better than nanosecond resolution over half a century. That is a precision of greater than 10^-18. Atomic clocks are regularly measured to picosecond precision and the international time standard is currently measured to 10^-14 seconds. In the near future (~10 years), optical clocks will have a precision of ~10^-21 seconds. For a few thousand dollars, you can buy a card that measures time to <1 picosecond (see the GuideTech website). My point is that nanosecond and better precision is here and will only get more precise in the near future. After reading through this discussion, you have convinced me never to use the DateTime module, because it does not appear to adhere to the internationally approved time standards. That’s unfortunate, because a standard time library that adheres to approved standards would be beneficial to Julia. Sorry for being a Debby Downer.
(*) The Master Clock is actually an assemble of over 200 atomic clocks separated into 5 Master Clocks. Each Master Clock is compose of short timescale hydrogen masers and medium timescale cesium clocks. They are all synchronized by 6 long timescale Rubidium Fountain clocks that are accurate to <1 second over the lifetime of the Universe, i.e., ~14 billion years.
“expertise” may be a strong word
but I work in a domain where timestamp arithmetic performance does matter (hft). and as I mentioned before, I feel quite sure that there must be no (serious) DateTime
users in this domain due to other aspects of its design—not performance
so from that perspective, I wholeheartedly support checked arithmetic
While pondering this issue a bit more, the problem that I see is that DateTime is mixing functionality with input/output. I have always found this to eventually lead to problems. How the values are stored internally in the code should not matter how they are read in or written out. IO is the responsibility of the user. If it takes 128 bits to implement the functionality accurately and precisely, then so be it. Most users are working with time series of a few thousand values, and probably not more than a few million at most. I would think a few billion values would be very rare. Hence the amount of memory needed is not huge. In other words precision is more likely to be a problem than compactness of the data. If compactness is an issue, then the times can be stored as an offset to a reference time or similar data structure. For astronomical software, I will focus on ensuring that precision is paramount.
I see DateTime
as a kind of integer, so I want DateTime
to behave like Int
and I also want a “floating-point” companion to DateTime
as Int
has Float64
as a companion.
To read a DateTime
from a String is problematic:
d = parse(DateTime, "2023-07-01T12:34:56.123456") # -> error
For Int
, you solve this problem by
f = parse(Float64, "1234.5678")
i = round(Int, f)
So, it would be nice if we have a floating-point-like companion:
fd = parse(FloatingDateTime, "2023-07-01T12:34:56.123456")
d = round(DateTime, fd)
The “microsecond” problem is the same. Mincrosecond is to DateTime
what Float64
is to Int
:
f = 3 + 0.6 # -> Float64
i = Int(f) # error
i = Int(round(f)) # fine
Then, I would like
fd = DateTime( . . . ) + nanosecond( . . . ) # -> floating-point DateTime
d = DateTime(fd) # error
d = DateTime(round(fd)) # fine
I don’t think the issue is a lack of adherence to international standards. The Dates module does aim to adhere to an international standard, however, it’s one for representing, communicating, and doing simple calendar arithmetic on Gregorian calendar dates and 24-hour time of day (ISO 8601), not one for precision timekeeping (these are really quite unrelated concerns). It can be useful for things like timestamps in database records, but probably not for astronomy and GNSS, and this is arguably the correct tradeoff for a standard library.
Specifically, the documentation states that Dates assumes Universal Time - Wikipedia, which is basically solar time, meaning that the lengths of days and seconds are nonuniform due to variations in the earth’s rotation. This is convenient because it means each day has exactly 86400 seconds and you don’t have to fuss with leap seconds in DateTime arithmetic, but it also means that even millisecond precision is a bit nonsensical, and locating events with nanosecond precision on a timeline that’s uniform across centuries is way beyond the scope of the module.
The mystery here is really why the module supports sub-millisecond intervals at all, given the other design decisions. Perhaps the solution is to deprecate Microsecond
and Nanosecond
and point users to appropriate packages?
That is not the only reason for wanting increased precision though. Some might want that, but many others (including myself) would just want to support the timescales that are in use today. Many things operate at a sub-millisecond frequency today. I personally do not care if I can measure something down to the nanosecond on Sep 1 1502. But right now (I mean literally right as I type), many datasets in various fields of work are being populated with data timestamped to the nanosecond.
This is really the big point in my opinion. If I have a CSV file from someone with ISO timestamps to the nanosecond, I want to be able to read them in, and then discover the time difference between the first one and the tenth one (or whatever).
That’s fair, just note that it would add some implementation complexity and maintenance burden in that your module now needs to know about leap seconds (assuming UTC timestamps).
Let’s continue the discussion on UT vs UTC over in Universal Time vs UTC Time in Dates and try to keep this topic on the higher precision arithmetic. My split wasn’t perfect here, but I tried to preserve as much as I could. In doing so, I unfortunately ran into some Discourse bugs, which placed some posts completely out of order. Please ping me if you see anything that’s still out-of-place or nonsensical in its (possibly new) context, but I think I got it now.
I see is that DateTime is mixing functionality with input/output. I have always found this to eventually lead to problems. How the values are stored internally in the code should not matter how they are read in or written out
I may not be understanding your concern, but that doesn’t appear to be a problem for the Dates stdlib. DateTime
is represented in milliseconds and all arithmetic operations use this representation directly. Parsing strings to extract DateTime, and printing DateTime are completely separate. Can you be a bit more concrete about your concern?