Parsing DateTime, what should be the default(s)?

I CAN parse without the T, it’s just not the default:

julia> DateTime("2021-08-28 00:00:59", dateformat"yyyy-mm-dd HH:MM:SS.s")
2021-08-28T00:00:59

while:

julia> DateTime("2021-08-28 00:00:59")
ERROR: ArgumentError: Invalid DateTime string
Stacktrace:
 [1] parse(#unused#::Type{DateTime}, s::String, df::DateFormat{Symbol("yyyy-mm-dd\\THH:MM:SS.s"), Tuple{Dates.DatePart{'y'}, Dates.Delim{Char, 1}, Dates.DatePart{'m'}, Dates.Delim{Char, 1}, Dates.DatePart{'d'}, Dates.Delim{Char, 1}, Dates.DatePart{'H'}, Dates.Delim{Char, 1}, Dates.DatePart{'M'}, Dates.Delim{Char, 1}, Dates.DatePart{'S'}, Dates.Delim{Char, 1}, Dates.DatePart{'s'}}})
   @ Dates ~/julia-1.9-DEV-65b9be4086/share/julia/stdlib/v1.9/Dates/src/parse.jl:277
 [2] DateTime (repeats 2 times)
   @ ~/julia-1.9-DEV-65b9be4086/share/julia/stdlib/v1.9/Dates/src/io.jl:576 [inlined]
 [3] top-level scope
   @ REPL[41]:1

I believe the T (and only upper case) is there by design in the date format. I do see it omitted in my Pandas work, e.g. “1957-01-31 00:00:00+00:00”. I was thinking should we allow either way (and maybe lower-case t too), and also the “+00:00” (which isn’t allowed, with or with T)? If the format is strict by design, maybe we can at least make the error message less cryptic better for the beginners. I think I can make a PR to add what’s needed, less clear I can or drop the cryptic stuff.

B.
I was looking into if we have the problem explained here (is “60” actually legal for a leap-second and we should allow it?):
https://github.com/microsoft/STL/issues/2698

C.
I looked up what other languages, Python, does, and its default is:

>>> date = time.strptime("Tue May 3 00:00:00 2022")
>>> date
time.struct_time(tm_year=2022, tm_mon=5, tm_mday=3, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=123, tm_isdst=-1)

But ironically it also allows with the wrong weekday without complaining, do we allow such a format (how?), and should we complain, or keep bug-compatibility(?) with Python:

>>> a = time.strptime("Mon May 3 00:00:00 2022")

The current version of the ISO 8601 standard seems to make the “T” as the separator between date and time mandatory (in previous versions it could be omitted).

“60” for the leap seconds is definitely legal in ISO 8601. However, a time falling into a leap second could not be represented internally by a Julia date/time structure and would have to be converted to a time within the second before. Formatting it back into a date/time string then does not give the original leap second time back.

A solution could be to use a day-segmented internal representation, with a number for the day and another for the fraction. But to change the internal representation of a date/time would be a major effort.

4 Likes

Is that on output, or input too? It’s not wrong to only parse one input format (maybe even demanded), but it seems you might want to be relaxed regarding input, but only strict on output.

Or if insist on on the T helps to prevent reading invalid data (or just for speed, likely), then I suggest:

ERROR: ArgumentError: Invalid DateTime string, you may want keyword argument
such as dateformat"yyyy-mm-dd HH:MM:SS.s" or dateformat"yyyy-mm-dd"

Something like that, I’m open to suggestions.

I do see there are some alternative T (and W) data format supported with the ISO standard, not yet supported like this (nor shouldn’t be?):

DateTime("20220503T144809Z")

Parsing could certainly be more liberal than demanded by the standard. If there is a “-” separator in the date format and the “:” in the time then having a space instead of the “T” between data and time is unambiguous, clear for the human reader and quite common.

2 Likes

Just guessing, the reason for “insisting” on having the ‘T’ between the date part and the time part (and, thereby making the more readable – and more easily data verifiable use of ’ ’ as the separator), was to preclude software/people mistakenly considering the the “string” to be giving two different items rather than one, more elaborated item.

imo ‘^’ would have been a more helpful choice while meeting the desire for inked continuity. ‘T’ looks somewhat like ‘1’, ‘7’; that makes it more difficult to visually scan dates-with-times.