Reading Date.jl

Hello,

I am trying to read a file that is in the following format

    Id           |            Date            |empId               |    Sale        |    Number
        7	     |'Feb  4 1999 1:00:00:000PM' |  07	               |      1400	    |    1245

My date Time is in quotes - how do I import CSV in Julia to make it readable. I have tried using For loop with Datetime function but that’s deprecated too.

So I tried changing the type when reading CSV, which did not work either

    CSV.read(joinpath("mytable");types=[Int64,String,Int64,Int64,String])

Although it worked perfectly when dates were without quotes and in yyyy-mm-dd format.

Can someone help, what is the right way of doing it??

So here’s a few tips/tricks to get what you pasted above to parse correctly, first my code:

using CSV, Tables, DataFrames, Dates
# the stdlib Dates module doesn't support AM/PM
# so we make our own function to parse
function todate(str)
    if endswith(str, "PM")
        return DateTime(str, dateformat"uuu  dd yyyy HH:MM:SS:sss\P\M") + Dates.Hour(12)
    else
        return DateTime(str, dateformat"uuu  dd yyyy HH:MM:SS:sss\A\M")
    end
end

julia> CSV.File(file; delim="|", quotechar='\'') |> Tables.transform(Date=todate) |> DataFrame
1Γ—5 DataFrame
β”‚ Row β”‚  Id   β”‚ Date                β”‚ empId β”‚ Sale  β”‚ Number β”‚
β”‚     β”‚ Int64 β”‚ DateTime            β”‚ Int64 β”‚ Int64 β”‚ Int64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 7     β”‚ 1999-02-04T13:00:00 β”‚ 7     β”‚ 1400  β”‚ 542378 β”‚

Of note:

  • we set delim="|" because our file has the | delimiter
  • we set quotechar='\'' because some fields are surrounded by single quotes, but we want to ignore them
  • we use Tables.transform(Date=todate) which says, β€œfor the Date column in my input CSV.File, apply the todate function to each element”
  • Finally, we materialize the csv file + transform in a DataFrame by doing |> DataFrame

Hopefully that helps!

5 Likes

Caution! There’s a gotcha with 12 AM / 12 PM which easily causes bugs. Using the above todate implementation:

julia> todate("Feb  4 1999 12:15:00:000AM")
1999-02-04T12:15:00

julia> todate("Feb  4 1999 1:15:00:000AM")
1999-02-04T01:15:00

julia> todate("Feb  4 1999 12:15:00:000PM")
1999-02-05T00:15:00

julia> todate("Feb  4 1999 1:15:00:000PM")
1999-02-04T13:15:00

See how 12 AM and 12 PM are parsed incorrectly. A fixed version:

function todate(str)
    d = DateTime(str[1:end-2], dateformat"uuu  dd yyyy HH:MM:SS:sss")
    d + Hour(12 * +(endswith(str, "PM"), hour(d) != 12, -1))
end

Testing it:

julia> todate("Feb  4 1999 12:15:00:000AM")
1999-02-04T00:15:00

julia> todate("Feb  4 1999 1:15:00:000AM")
1999-02-04T01:15:00

julia> todate("Feb  4 1999 12:15:00:000PM")
1999-02-04T12:15:00

julia> todate("Feb  4 1999 1:15:00:000PM")
1999-02-04T13:15:00
1 Like