DateTime: how to handle "yymmdd" format?

I have to read a text data which uses the “yymmdd” format for the date. For example the date 2023-05-19 is expressed as “230519” in the file. How does one convert the string to a DateTime object?

In Julia’s Dates package, the format “yymmdd” gives the year 23, not 2023:

julia> using Dates
julia> t = DateTime("23051906", DateFormat("yymmddHH"))
0023-05-19T06:00:00

Of course I can manipulate the input string before passing it to the DateTime() function or modify the output DateTime object after the function, but I’d like to know if there is already a ready-made solution available.

Given that there is no way of inferring from the data themselves whether “23” means “2023” or, say, “1923”, I don’t see any way around adding 2000. But you don’t need to manipulate the string. Instead

t = DateTime("23051906", DateFormat("yymmddHH")) + Dates.Year(2000)
2 Likes

Given that there is no way of inferring from the data themselves whether “23” means “2023” or, say, “1923”,

Of course! I’m asking this because it’s quite a common practice. Extremely common. I’m sure you know that. On a car license plate, you have a sticker saying “22” to indicate the date of the last inspection. Nobody thinks it means 1922.

I’m dealing with such a dataset.

I can imagine this kind of user interface:

DateFormat("yymmdd", yearbias=2000)

Because year is biased toward 2000, “49” would mean “2049”, but “51” would mean “1951”.

adding 2000 . . .

That’s the tedious part, because “95” means “1995”, not “2095”. (I’m not saying it’s hard. Not at all. I’m just saying that I don’t want to write the code if there already is a solution.)

But the 20 from 23 to 2023 is implicit.
That makes only sense to be common practice if that is commonly known – and even then, you run into things like the Y2K-Bug.

Besides that – from the docs I can no see any possibility as the user interface you wish for. But if you are veryvery sure, you could ass a "20" upfront every of your strings that starts with "23" (or anything else than 20 if you are sure you do not handle data from 2020).

But the 20 from 23 to 2023 is implicit.
That makes only sense to be common practice if that is commonly known . . .

Probably you responded to my initial posting. Please look at my second posting.

There, I showed an example interface where the user has to provide the information to disambiguate. For example, you specify 2000 as the center and then “23” will become unambiguously 2023. I didn’t ask for an interface that tries to guess what the central year is.

and even then, you run into things like the Y2K-Bug.

Sorry I failed to understand what problem you see for our case (converting a two-digit year to a real year). Perhaps you allude to the year-0 problem? (What’s the year before 1 AD? Is it the year 0 or is it 1 BC?). You would have to change the algorithm if you use a calendar where 1 BC precedes 1 AD.

But if you are veryvery sure, you could ass a "20" upfront every of your strings that starts with "23"

To use such a dataset as I have, you need to be sure where the central year is in any case. You need to know whether “49” means 1949 or 2049. Once you know the central year, then you write a program like this:

function convert_2digit_year(year2d, center)
  @assert(0 <= year2d <= 99)
  # For example, center = 2000
  year_offset = (year2d < 50) ? center : (center - 100)
  return  year2d + year_offset
end

If the central year is set to be 2000, “23” will be unambiguously 2023, “49” will be unambigously 2049, and “51” unambiguously 1951 (because 1951 is closer to 2000 than 2051 is to 2000). In this example, I have decided that “50” means “1950”. If you know that “50” means “2050”, then you change year2d < 50 to year2d <= 50.

This problem is so common when you handle date-related, real-world data that I thought this little trick may be already implemented somewhere in the Dates package.

Since there seems to be no such a thing, I would put my little utility function in my “kitchen sink” module.

Ok, that is what I meant by “veryvery” sure :slight_smile: If you have a dataset with documented knowledge what the two digits mean and how to transfer them – ten your converter sounds good.

I for example would have thought any 2-digit would just be prefixed with 20, so we would always add 2000, but you have more knowledge about your data then, where it seems from 50-99 it refers to 19xx?

But yeah, I could not find a function in Date that could help you here, so keeping it in you utilities would be a good idea.
You could also think about a discussion/an issue in dates to add this? Or maybe even provide a PR?