Why do time quantities have to be integers?

:open_mouth:

2 Likes

It’s perfectly valid to think of a date as a position or angle, as it’s ultimately intended to be a measure of the Earth’s motion about the sun. The time of day is a measure of the rotation of the Earth. One can think of both as having the same units as angles (in fact, that’s literally where the units of minute, second, etc. came from). Clearly, it would be silly if we couldn’t convert a floating point angle to a measurement of angle that included integer representations of degrees, minutes, arcseconds, milliarcseconds, etc. To say that time should only be represented as a set of integers is to say that angles should be represented this way too.

Viewing time as a geophysical quantity, it’s clear that a floating point representation is natural, and a conversion to people-oriented things like months, days, hours, etc. is a nice thing to have.

Consider that many folks will use Julia to integrate something over time numerically, where the integration begins at some start date. Clearly, we’d like a convenient way to turn the (floating point) integration step times into dates without a lot of fuss. That is, I expect that lots of applications will have and will practically need floating point times.

It sounds to me like @yakir12 is doing the right thing to add a floating point conversion.

This really points to the difference between elapsed time (how many times my cesium source “ticked”) vs dates (how many times the earth went around the sun) and times of day (how many times the earth rotated). It sounds like maybe the integer-focused folks are more interested in the latter (dates and times of day), while the float-focused folks are talking more about the former (elapsed time).

Speaking of accuracy, it’s perhaps important to note that elapsed time cannot be turned into UTC dates in the future with a resolution of 1s beyond about the next 6 months, because we don’t know well enough how much the Earth will rotate. This is why we need leap seconds. That is to say, we do not know how much time will elapse between 15:04:23 of September 22nd in 2017 (UTC) with what will eventually be called 15:04:23 of September 22nd in 2018 (UTC). Also, the Julia astro package includes utilities for these types of calculations (the UT1-UTC offset calculated via published IERS data) – super cool!

While on the subject, happy equinox everyone. :slight_smile:

2 Likes

Oh no. That’s such an unfortunate choice. If they’d used the UNIX epoch it would have exactly the precision/range tradeoff you actually want.

1 Like

Obviously Dates is a mess. What ever the bike shedding is, and it is, can we put someone on it? time representation seems crucial for a v1.0, does it not?

It will probably be done by someone who needs this behavior. Since you raised the issue, you could try submitting a PR.

What makes you think that using Float64 would be less accurate? Floating-point computations with integers < 2^53 are exact.

2 Likes

This blog post by Bruce Dawson (actually mentioned in the Julia manual!) is an interesting in-depth analysis of the related issues. TL;DR:

Elapsed game time should never be stored in a float. Use a double instead.

so a Float64 (what C calls a double) should be OK.

0.3 years in nanosecond > 2^53 nanoseconds

Two points:

  • 300 years in nanoseconds > 2^63 nanoseconds. If you are measuring years in nanoseconds, pretty soon you are going to overflow Int64 too.

  • If you are measuring years in nanoseconds, I’m willing to bet that you don’t actually have nanosecond accuracy. Float64 will still keep 15 significant digits for numbers > 2^53, but Int64 will give complete nonsense if you exceed 2^63.

2 Likes

Blockquote
If you are measuring years in nanoseconds

That’s not my point, although nanosecond is usually the minimal time unit in designs. In my previous post, I said usually multiple integers(Int64) have be used with different scale of time units in the design.

If DateTime API is designed with Float64 format numbers, a single time unit is likely to be used. (Of course we may still use one Float64 for day and another Float64 for second if we choose to. This option is out of my consideration)

So this is a ONE Float64 vs. multiple Int64 comparison, and ONE time unit system vs. multiple time units design comparison.

No one is advocating a single Float64 time unit. The question is, why can’t the current design (multiple time units) wrap Float64 values instead of Int64? Float64 seems a lot more flexible, with no apparent loss in accuracy.

1 Like

3 posts were split to a new topic: Was: Why do time quantities have to be integers?

The Dates module in Base was designed specifically with a few goals in mind:

  • Simplicity: straightforward code with intuitive apis
  • Efficiency: both in memory footprint & various calculations (compared to any other language basic datetime library)
  • Analytics: in the “data analyst” world, timezones, leap seconds, and sub-nanosecond time precisions can often be ignored in favor of the two points already listed

This leads to the things that were specifically “punted” on (i.e. not designed for):

  • Sub-nanosecond precision: requires larger memory footprint and native sub-nanosecond OS APIs
  • Timezones: This avoids having to deal w/ various political shifts around the world, having to update the timezone database regularly, and having all the “timezone data compilation” have to live in Base
  • Non-gregorian calendars

Now, just because these things weren’t implemented in Base, doesn’t mean they weren’t duly considered in the design and implementation: indeed, at one point, the Dates module did include timezone support. The solution was to include design decisions that would allow packages to easily extend and implement these “date extensions” without needing to clutter Base w/ less-used features.

This has been very successfully shown to work in the TimeZones.jl package and the ZonedDateTime type.

On the subject of using Float64 as the internal storage, I’ve said it and I’ll say it again: anyone is free to re-implement Dates using floats, I just think it’s going to be a harder time to achieve the same level of simplicity and test coverage that we currently have. I’m definitely open to it, but there was a reason I stuck with Int64 when implementing as it made my life a lot easier. Should we support things like Dates.Second(1.5) => Dates.Millisecond(1500)? Sure, that seems cute.

At the end of the day, I think the entire community would be better served by having a separate package (ScientificTimes.jl or something) with a NanoTimestamp type (or whatever you want to call it) that could truly support the extended precision and play nice with floats.

For the data analyst world, I still think that Dates is and will continue to be one of the best-all-around implementations for simplicity and efficiency out there.

7 Likes

Please forgive my oh-so-tainted presence, but… I just thought of something that could be great or moot, I’d appreciate your opinions:

Images.jl encodes all pixel intensities as a float between 0 and 1, even though the real value behind the scene is any of the UInts, as is the case in all image formats. They use FixedPointNumbers.jl to accomplish this. Could we use the same trick here? i.e. allow for float like behavior and appearance but work with actual Ints for all the above-listed and indisputable benefits?

Just a thought.

Related: I wrote a little package to represent dates using subtypes of Integer other than Int64, counting from a given epoch. This saved tens of Gb for me on a large dataset, and the abstraction is near-zero cost when using the same epoch.

3 Likes

Nice! But post this on its own ANN thread. Otherwise it might get missed?

It is WIP, I will announce it after some practical testing.

Could it be extended to use a type such as Dec128 (from DecFP.jl)?

In theory yes, in practice I am not sure dates and floating point mix well. If you want timeunits with more precision than a day, a Date (or analogue, which is what this package provides) is not ideal.