Obviously Dates is a mess. What ever the bike shedding is, and it is, can we put someone on it? time representation seems crucial for a v1.0, does it not?
It will probably be done by someone who needs this behavior. Since you raised the issue, you could try submitting a PR.
What makes you think that using Float64
would be less accurate? Floating-point computations with integers < 2^53 are exact.
This blog post by Bruce Dawson (actually mentioned in the Julia manual!) is an interesting in-depth analysis of the related issues. TL;DR:
Elapsed game time should never be stored in a float. Use a double instead.
so a Float64
(what C calls a double
) should be OK.
0.3 years in nanosecond > 2^53 nanoseconds
Two points:
-
300 years in nanoseconds > 2^63 nanoseconds. If you are measuring years in nanoseconds, pretty soon you are going to overflow Int64 too.
-
If you are measuring years in nanoseconds, I’m willing to bet that you don’t actually have nanosecond accuracy. Float64 will still keep 15 significant digits for numbers > 2^53, but Int64 will give complete nonsense if you exceed 2^63.
Blockquote
If you are measuring years in nanoseconds
That’s not my point, although nanosecond is usually the minimal time unit in designs. In my previous post, I said usually multiple integers(Int64) have be used with different scale of time units in the design.
If DateTime API is designed with Float64 format numbers, a single time unit is likely to be used. (Of course we may still use one Float64 for day and another Float64 for second if we choose to. This option is out of my consideration)
So this is a ONE Float64 vs. multiple Int64 comparison, and ONE time unit system vs. multiple time units design comparison.
No one is advocating a single Float64 time unit. The question is, why can’t the current design (multiple time units) wrap Float64 values instead of Int64? Float64 seems a lot more flexible, with no apparent loss in accuracy.
3 posts were split to a new topic: Was: Why do time quantities have to be integers?
The Dates
module in Base was designed specifically with a few goals in mind:
- Simplicity: straightforward code with intuitive apis
- Efficiency: both in memory footprint & various calculations (compared to any other language basic datetime library)
- Analytics: in the “data analyst” world, timezones, leap seconds, and sub-nanosecond time precisions can often be ignored in favor of the two points already listed
This leads to the things that were specifically “punted” on (i.e. not designed for):
- Sub-nanosecond precision: requires larger memory footprint and native sub-nanosecond OS APIs
- Timezones: This avoids having to deal w/ various political shifts around the world, having to update the timezone database regularly, and having all the “timezone data compilation” have to live in Base
- Non-gregorian calendars
Now, just because these things weren’t implemented in Base, doesn’t mean they weren’t duly considered in the design and implementation: indeed, at one point, the Dates
module did include timezone support. The solution was to include design decisions that would allow packages to easily extend and implement these “date extensions” without needing to clutter Base w/ less-used features.
This has been very successfully shown to work in the TimeZones.jl package and the ZonedDateTime
type.
On the subject of using Float64
as the internal storage, I’ve said it and I’ll say it again: anyone is free to re-implement Dates
using floats, I just think it’s going to be a harder time to achieve the same level of simplicity and test coverage that we currently have. I’m definitely open to it, but there was a reason I stuck with Int64
when implementing as it made my life a lot easier. Should we support things like Dates.Second(1.5) => Dates.Millisecond(1500)
? Sure, that seems cute.
At the end of the day, I think the entire community would be better served by having a separate package (ScientificTimes.jl or something) with a NanoTimestamp
type (or whatever you want to call it) that could truly support the extended precision and play nice with floats.
For the data analyst world, I still think that Dates
is and will continue to be one of the best-all-around implementations for simplicity and efficiency out there.
Please forgive my oh-so-tainted presence, but… I just thought of something that could be great or moot, I’d appreciate your opinions:
Images.jl encodes all pixel intensities as a float between 0 and 1, even though the real value behind the scene is any of the UInt
s, as is the case in all image formats. They use FixedPointNumbers.jl to accomplish this. Could we use the same trick here? i.e. allow for float like behavior and appearance but work with actual Int
s for all the above-listed and indisputable benefits?
Just a thought.
Related: I wrote a little package to represent dates using subtypes of Integer
other than Int64
, counting from a given epoch. This saved tens of Gb for me on a large dataset, and the abstraction is near-zero cost when using the same epoch.
https://github.com/tpapp/FlexDates.jl
Nice! But post this on its own ANN thread. Otherwise it might get missed?
It is WIP, I will announce it after some practical testing.
Could it be extended to use a type such as Dec128
(from DecFP.jl
)?
In theory yes, in practice I am not sure dates and floating point mix well. If you want timeunits with more precision than a day, a Date
(or analogue, which is what this package provides) is not ideal.