Reading Date.jl

#1

Hello,

I am trying to read a file that is in the following format

    Id           |            Date            |empId               |    Sale        |    Number
        7	     |'Feb  4 1999 1:00:00:000PM' |  07	               |      1400	    |    1245

My date Time is in quotes - how do I import CSV in Julia to make it readable. I have tried using For loop with Datetime function but that’s deprecated too.

So I tried changing the type when reading CSV, which did not work either

    CSV.read(joinpath("mytable");types=[Int64,String,Int64,Int64,String])

Although it worked perfectly when dates were without quotes and in yyyy-mm-dd format.

Can someone help, what is the right way of doing it??

0 Likes

#2

So here’s a few tips/tricks to get what you pasted above to parse correctly, first my code:

using CSV, Tables, DataFrames, Dates
# the stdlib Dates module doesn't support AM/PM
# so we make our own function to parse
function todate(str)
    if endswith(str, "PM")
        return DateTime(str, dateformat"uuu  dd yyyy HH:MM:SS:sss\P\M") + Dates.Hour(12)
    else
        return DateTime(str, dateformat"uuu  dd yyyy HH:MM:SS:sss\A\M")
    end
end

julia> CSV.File(file; delim="|", quotechar='\'') |> Tables.transform(Date=todate) |> DataFrame
1×5 DataFrame
│ Row │  Id   │ Date                │ empId │ Sale  │ Number │
│     │ Int64 │ DateTime            │ Int64 │ Int64 │ Int64  │
├─────┼───────┼─────────────────────┼───────┼───────┼────────┤
│ 1   │ 7     │ 1999-02-04T13:00:00 │ 7     │ 1400  │ 542378 │

Of note:

  • we set delim="|" because our file has the | delimiter
  • we set quotechar='\'' because some fields are surrounded by single quotes, but we want to ignore them
  • we use Tables.transform(Date=todate) which says, “for the Date column in my input CSV.File, apply the todate function to each element”
  • Finally, we materialize the csv file + transform in a DataFrame by doing |> DataFrame

Hopefully that helps!

3 Likes

#3

Caution! There’s a gotcha with 12 AM / 12 PM which easily causes bugs. Using the above todate implementation:

julia> todate("Feb  4 1999 12:15:00:000AM")
1999-02-04T12:15:00

julia> todate("Feb  4 1999 1:15:00:000AM")
1999-02-04T01:15:00

julia> todate("Feb  4 1999 12:15:00:000PM")
1999-02-05T00:15:00

julia> todate("Feb  4 1999 1:15:00:000PM")
1999-02-04T13:15:00

See how 12 AM and 12 PM are parsed incorrectly. A fixed version:

function todate(str)
    d = DateTime(str[1:end-2], dateformat"uuu  dd yyyy HH:MM:SS:sss")
    d + Hour(12 * +(endswith(str, "PM"), hour(d) != 12, -1))
end

Testing it:

julia> todate("Feb  4 1999 12:15:00:000AM")
1999-02-04T00:15:00

julia> todate("Feb  4 1999 1:15:00:000AM")
1999-02-04T01:15:00

julia> todate("Feb  4 1999 12:15:00:000PM")
1999-02-04T12:15:00

julia> todate("Feb  4 1999 1:15:00:000PM")
1999-02-04T13:15:00
1 Like