TimesDates.jl dev for v2

(simplifying somewhat) Internally the fractional parts of a second are stored as an integer number of nanoseconds. There are a number of practical reasons for this that I won’t get into.

So one would have to convert the float value 0.123456789 into the integer value 123,456,789 the number of nanoseconds past the previous whole second. This conversion would be somewhat complicated.

Additionally, most use case are more likely to be ingesting data from a file or generated by the system for manipulation. Creating a NanoDate directly as shown is probably less frequent.

Finally if a “shorter” syntax is desired NanoDate(2022,5,3,12,15,30,NanoSeconds(123456789)) would due. (I may have that syntax slightly wrong. It may be NanoDate(2022,5,3,12,15,30,123,NanoSeconds(456789)) or similar. )

I am happy to add that, although probably as an integer rather than a float for accuracy. The way you illustrated mirrors one form of the constructors for Time, Date and DateTime.

Currently available is

NanoDate(now(), Nanosecond(123456))
2022-05-03T13:50:25.242123456

I had allowed for the integer (without it being wrapped as Nanoseconds) in an earlier path – that seemed to be contrary to an ethos in the Dates base code, and could introduce some misunderstanding. So, throughout the code, I moved to use entities that are <: AbstractTime in userfacing stuff.

If one allows an unwrapped integer how does one interpret an integer with less than 6 digits?

What is the meaning of NanoDate(now(),9)
is it “2022-05-03T13:50:25.1239” or “2022-05-03T13:50:25.123000009”?

If it’s the latter than to get the former presumably I would have to enter NanoDate(now(),900000) which seems prone to error. It requires the user to know that the integer represents the number of nanoseconds (but only up to 999,999) lest one combine the milliseconds of the DateTime with the nanoseconds being added.

If it is the former it seems unclear how to construct the latter. Forcing the user to wrap the integer makes them to check their units. Just something to think about.

It would be that one has elided the Nanosecond(*quantity*), and simply placed the unadorned quantity in the constructor. The interpretation is NanoDate(now(), Nanosecond(9)). As DateTime values, e.g. now() cannot carry milliseconds nor nanoseconds, it is helpful to have NanoDate constructors that accept:

NanoDate(::DateTime, ::Millisecond)
NanoDate(::DateTime, ::Nanosecond)
NanoDate(::DateTime, ::Millisecond, ::Nanosecond)

all are available, in each case, they expect their TimePeriods to be given explicitly, rather than as an Int.

It seems to me that NanoDate(DateTime(2013,7,1,12,30,59,1),1_000_000) results in “2013-07-01T12:30:59.002” (note the final 2) is a bit of a sharp edge assuming the DateTime came from some other area and wasn’t a literal value in the code. I think the internal structure (DateTime and an integer less than 1e6) should be hidden from the user.

If DateTime (hypothetically) had only 1 second resolution then a constructor taking a DateTime (with no sub-second component) and an integer of NanoSeconds would be reasonable but since it is likely that some but not all of the digits after decimal are already in place because DateTime does have subsecond precision it seems ripe for confusion.

Are you concurring with best-practice uses Periods rather than Ints?

Often a NanoDate will be constructed from a variable that is either a DateTime or a Date, possibly with additional fine resolution periods. Another likely scenario is forming a NanoDate from an extant NanoDate with some modification.
While this can be done by adding/subtracting periods from the NanoDate, or with trunc(::NanoDate, ::Type{Period}), within a function the ability to modify a NanoDate within the constructor does simplify the solution.

Yes, I would say it is very much best practice to use Periods rather than Ints. I fully recognize that reasonable minds can disagree on that point however.

NanoDate(some_datetime,100000) just seems very prone to error compared to NanoDate(some_datetime, Microseconds(123)) or NanoDate(some_datetime, Nanoseconds(123456))

(Also all the constructors that dont involve an existing DateTime (and it’s millisecond precision) seem totally fine to me. I have no problem with NanoDate(2022,5,13,9,34,59,123,456,789) and of course adding the proposed constructor won’t disrupt the others so none of this would disrupt how I will use the package. As such I want to make clear there is no malice or anything over the issue. Was just trying to point out what I saw as a potential rough spot).

Your perspective is welcome and is useful.
That thought extends to all participants here.
Our give-and-take makes the package better.

1 Like

Give me a few examples of Date+Timeofday as strings that you would like to parse – I need a few test cases.

# Example-1: string is part of a CSV file
"...,2021-11-30,11:47:02.350190287,..."

Here is an interesting one. Could be parsed separately (Time and Date) and then combined but it would be nice to have a single pass at it. This is the first line of a metadata header file that I have to deal with.

093522.1275 	 03/05/2020 

a path less traveled
           the long way around

using Dates, NanoDates

isstring(x::AbstractString) = true
isstring(x) = false

metadata = "093522.1275  03/05/2020"

r_hms = r"(?<hour>\d{2})(?<minute>\d{2})(?<second>\d{2})(?:(.))"
r_subseconds = r"(?<millisecond>\d{3})(?<microsecond100>\d{1})(?:(\s+))"
r_dmy = r"(?<day>\d{2})(?:(.))(?<month>\d{2})(?:(.))(?<year>\d{4})"

const MetadataRegex = r_hms  * r_subseconds  * r_dmy

m = match(MetadataRegex, metadata_substr)

fieldkeys = filter(isstring, keys(m))
fieldvalues = [Meta.parse(m[key]) for key in fieldkeys]

t = timeparts = Dict(fieldkeys .=> fieldvalues)

metadata_time = 
  NanoDate(t["year"], t["month"], t["day"],
           t["hour"], t["minute"], t["second"],
           t["millisecond"], t["microsecond100"]*100)

string(metadata_time)
"2020-05-03T09:35:22.127500"
2 Likes

That’s a much cleaner implementation of what I would write too. I do really like how you laid that out.

That said I greatly look forward to the day I can do this

NanoDate("093522.1275  03/05/2020", nanodateformat"HHMMSS.ssss  mm/dd/yyyy")

The existing DateFormat structure is perfectly happy to represent additional subsecond characters beyond the millisecond.

julia> d=dateformat"yyyy-mm-ddTHH:MM:SS.sssssssss"
dateformat"yyyy-mm-ddTHH:MM:SS.sssssssss"

julia> d.tokens
(DatePart(yyyy), Delim(-), DatePart(mm), Delim(-), DatePart(dd), Delim(T), DatePart(HH), Delim(:), DatePart(MM), Delim(:), DatePart(SS), Delim(.), DatePart(sssssssss))

And even still works (with a DateTime compatible string)

julia> DateTime("2020-05-03T12:59:58.123",dateformat"yyyy-mm-ddTHH:MM:SS.sssssssss")
2020-05-03T12:59:58.123

An error isn’t thrown until you actually go to use this format on a string with more than 3 digits after the decimal point. An error is thrown here.

julia> DateTime("2020-05-03T12:59:58.1234",dateformat"yyyy-mm-ddTHH:MM:SS.sssssssss")
ERROR: InexactError: convert(Dates.Decimal3, 1234)
Stacktrace:
 [1] tryparsenext
   @ C:\Users\jcluts\scoop\apps\julia\current\share\julia\stdlib\v1.7\Dates\src\io.jl:153 [inlined]
 [2] tryparsenext
   @ C:\Users\jcluts\scoop\apps\julia\current\share\julia\stdlib\v1.7\Dates\src\io.jl:41 [inlined]
 [3] macro expansion
   @ C:\Users\jcluts\scoop\apps\julia\current\share\julia\stdlib\v1.7\Dates\src\parse.jl:64 [inlined]
 [4] tryparsenext_core(str::String, pos::Int64, len::Int64, df::DateFormat{Symbol("yyyy-mm-ddTHH:MM:SS.ssss"), Tuple{Dates.DatePart{'y'}, Dates.Delim{Char, 1}, Dates.DatePart{'m'}, Dates.Delim{Char, 1}, Dates.DatePart{'d'}, Dates.Delim{Char, 1}, Dates.DatePart{'H'}, Dates.Delim{Char, 1}, Dates.DatePart{'M'}, Dates.Delim{Char, 1}, Dates.DatePart{'S'}, Dates.Delim{Char, 1}, Dates.DatePart{'s'}}}, raise::Bool)
   @ Dates C:\Users\jcluts\scoop\apps\julia\current\share\julia\stdlib\v1.7\Dates\src\parse.jl:38
 [5] macro expansion
   @ C:\Users\jcluts\scoop\apps\julia\current\share\julia\stdlib\v1.7\Dates\src\parse.jl:150 [inlined]
 [6] tryparsenext_internal
   @ C:\Users\jcluts\scoop\apps\julia\current\share\julia\stdlib\v1.7\Dates\src\parse.jl:125 [inlined]
 [7] parse(::Type{DateTime}, str::String, df::DateFormat{Symbol("yyyy-mm-ddTHH:MM:SS.ssss"), Tuple{Dates.DatePart{'y'}, Dates.Delim{Char, 1}, Dates.DatePart{'m'}, Dates.Delim{Char, 1}, Dates.DatePart{'d'}, Dates.Delim{Char, 1}, Dates.DatePart{'H'}, Dates.Delim{Char, 1}, Dates.DatePart{'M'}, Dates.Delim{Char, 1}, Dates.DatePart{'S'}, Dates.Delim{Char, 1}, Dates.DatePart{'s'}}})
   @ Dates C:\Users\jcluts\scoop\apps\julia\current\share\julia\stdlib\v1.7\Dates\src\parse.jl:282
 [8] DateTime(dt::String, df::DateFormat{Symbol("yyyy-mm-ddTHH:MM:SS.ssss"), Tuple{Dates.DatePart{'y'}, Dates.Delim{Char, 1}, Dates.DatePart{'m'}, Dates.Delim{Char, 1}, Dates.DatePart{'d'}, Dates.Delim{Char, 1}, Dates.DatePart{'H'}, Dates.Delim{Char, 1}, Dates.DatePart{'M'}, Dates.Delim{Char, 1}, Dates.DatePart{'S'}, Dates.Delim{Char, 1}, Dates.DatePart{'s'}}})
   @ Dates C:\Users\jcluts\scoop\apps\julia\current\share\julia\stdlib\v1.7\Dates\src\io.jl:576
 [9] top-level scope
   @ REPL[23]:1

It may be possible to not need a new “nanodateformat” structure and only modify some of the parsing functions and use a lot of the Dates existing machinery.

1 Like

I spent a few days with Dates’ parsing and DateFormat traversal. There did not seem to be a nonsurgical way to co-opt the structure and machinery if we want it to do with NanoDates just as it does with DateTimes. Currently, I expect there a few edge bridging diversions we may enact, so the situations that are not copacetic for Dates’ internals do not sent into Dates’ internals without some transformation or redirection, as appropriate.
With your notes, that aspect of implementation may well be without consternation. (here’s hoping).

If you’re looking for Dates problems to solve, there are plenty

https://github.com/JuliaLang/julia/issues?q=is%3Aopen+is%3Aissue+label%3Adates

Please try GitHub - JuliaTime/NanoDates.jl: Dates with nanosecond resolved days once again.
There are more docs, although NanoDates.format still needs docs.


julia> NanoDates.format(nd, dateformat"yyyy-U-dd HH:MM:SS")
"2022-May-30 13:33:45"

julia> NanoDates.format(nd, dateformat"yyyy-U-dd HH:MM:SS.s")
"2022-May-30 13:33:45.123"

julia> NanoDates.format(nd, dateformat"yyyy-U-dd HH:MM:SS.ss")
"2022-May-30 13:33:45.123456"

julia> NanoDates.format(nd, dateformat"yyyy-U-dd HH:MM:SS.sss")
"2022-May-30 13:33:45.123456789"

julia> NanoDates.format(nd, dateformat"yyyy-U-dd HH:MM:SS.sss"; sep='_')
"2022-May-30 13:33:45.123_456_789"

Remind me of what is sought that remains as yet undone.
Let me know of any rough spots.
This is a penultimate info gathering.
I want to clean up what needs be cleaned up and put it out soonish.

2 Likes

There are two experiments in src/, one is more visited than the other.
To what extent each is a functional representation of its intent? well,
they are of the nature of coded notes.
nanoday.jl is one of several attempts to work through the dual purpose of Day (and Week, if you insist). Dates.Date(Year(yr), Month(mn), Day(dy)) utilizes Day (as day-of-the-month) calendrically.
Dates.Time(Hour(20), Minute(45)) + Hour(14) rolls over as
10:45:00 and the additional Day is lost, entirely.

As I see this, were Time to be “time-of-date” rather than “time-on-day”
Time(Day(5), Hour(20), Minute(45)) + Hour(14) becomes 6d 10:45:00 and that would be specified as Time(Day(6), 10, 45, 00) or with postfix (or prefix) 6dy == Day(6).

NanoDay.jl puts the period Day in DatePeriods and in TimePeriods – which confuses. The DatePeriod Day should be <your thought?> I have years of practice using Dayte for this role and letting Day be a TimePeriod (24 hours in a Day, 31 Daytes in January).

Doing some similar or other useful organization simplifies Date+Time arithmetic and makes calendrical statements be of a more consistent logic. So that is provided for your perusal, and unlikely to be in the next update.

WrappingPaper.jl is a way to hop-skip-jump into a metrizable space of interval valued CompoundPeriods as it were. The longer (equal to or longer than) duration and the shorter (equal to or shorter than) duration are defined so that each is realizable given the constraints / period values and any interval-valued duration collapses to a specific duration when the interval-valued duration has a known initial clock aor calender timeposition (most in the time past or closer to the past) [or conceivably known final timeposition … however there are compelling theoretical reasons to prefer CloseOpen duration specifications to OpenClose or CloseClose) [the specifics of which are somewhere findable].

That could appear in a revision to be considered.

not there yet, much closer – is this close enough?

note for reasons internal to Dates SS.ssss to indicate 4 chars is not supported because the following now is supported in NanoDates
s -> 1..3digits, ss -> 4..6digits, sss -> 7..9digits)
where s → shows 3 digits, ss → shows 6 digits, sss → shows 9 digits and subsecond digits are zero padded on the right (0.1234 → 0.123400) as needed.

if you flip around the input string for your example, then it is available:

julia> parse(NanoDate,"03/05/2020 093522.1275", dateformat"mm/dd/yyyy HHMMSS.ss")
2020-03-05T09:35:22.127500

or

const df = dateformat"mm/dd/yyyy HHMMSS.ss"
flipstr(s) = join(reverse(split(s, "  ")), " ")
parsemystr(s) = parse(NanoDate, flipstr(s), df)

instr = "093522.1275  03/05/2020"
result = parsemystr(instr)
result == NanoDate(2020, 3, 5, 9, 35, 22, 127, 500)

The most uncomfortable aspect of the current DateTime is how it exposes its internal implementation:

julia> t1 = DateTime(2017, 1, 1)
julia> t2 = DateTime(2017, 1, 1, 17)
julia> n = (t2 - t1) ÷ Dates.Hour(1)
julia> typeof(n)
Int64

julia> m = (t2 - t1) % Dates.Hour(1)
0 milliseconds

julia> m == 0
false

julia> typeof(m)
Millisecond

So, to test whether the time difference is an integral multiple of a given interval, you are strongly tempted to explicitly refer to “milliseconds”.

This trait is preventing the implementation to change its internal representation of DateTime and its associated types.

Is it possible to make this work

julia> m == 0 # -> true

by making the interval a subtype of Integer?

Would it be also possible that the type of period is called something like TimePeriod to hide its implementation as milliseconds?