spun7
February 18, 2019, 1:55am
1
Hello,
I am trying to read a file that is in the following format
Id | Date |empId | Sale | Number
7 |'Feb 4 1999 1:00:00:000PM' | 07 | 1400 | 1245
My date Time is in quotes - how do I import CSV in Julia to make it readable. I have tried using For loop with Datetime function but thatβs deprecated too.
So I tried changing the type when reading CSV, which did not work either
CSV.read(joinpath("mytable");types=[Int64,String,Int64,Int64,String])
Although it worked perfectly when dates were without quotes and in yyyy-mm-dd format.
Can someone help, what is the right way of doing it??
quinnj
February 18, 2019, 7:00am
2
So hereβs a few tips/tricks to get what you pasted above to parse correctly, first my code:
using CSV, Tables, DataFrames, Dates
# the stdlib Dates module doesn't support AM/PM
# so we make our own function to parse
function todate(str)
if endswith(str, "PM")
return DateTime(str, dateformat"uuu dd yyyy HH:MM:SS:sss\P\M") + Dates.Hour(12)
else
return DateTime(str, dateformat"uuu dd yyyy HH:MM:SS:sss\A\M")
end
end
julia> CSV.File(file; delim="|", quotechar='\'') |> Tables.transform(Date=todate) |> DataFrame
1Γ5 DataFrame
β Row β Id β Date β empId β Sale β Number β
β β Int64 β DateTime β Int64 β Int64 β Int64 β
βββββββΌββββββββΌββββββββββββββββββββββΌββββββββΌββββββββΌβββββββββ€
β 1 β 7 β 1999-02-04T13:00:00 β 7 β 1400 β 542378 β
Of note:
we set delim="|"
because our file has the |
delimiter
we set quotechar='\''
because some fields are surrounded by single quotes, but we want to ignore them
we use Tables.transform(Date=todate)
which says, βfor the Date
column in my input CSV.File
, apply the todate
function to each elementβ
Finally, we materialize the csv file + transform in a DataFrame by doing |> DataFrame
Hopefully that helps!
5 Likes
Caution! Thereβs a gotcha with 12 AM / 12 PM which easily causes bugs. Using the above todate
implementation:
julia> todate("Feb 4 1999 12:15:00:000AM")
1999-02-04T12:15:00
julia> todate("Feb 4 1999 1:15:00:000AM")
1999-02-04T01:15:00
julia> todate("Feb 4 1999 12:15:00:000PM")
1999-02-05T00:15:00
julia> todate("Feb 4 1999 1:15:00:000PM")
1999-02-04T13:15:00
See how 12 AM and 12 PM are parsed incorrectly. A fixed version:
function todate(str)
d = DateTime(str[1:end-2], dateformat"uuu dd yyyy HH:MM:SS:sss")
d + Hour(12 * +(endswith(str, "PM"), hour(d) != 12, -1))
end
Testing it:
julia> todate("Feb 4 1999 12:15:00:000AM")
1999-02-04T00:15:00
julia> todate("Feb 4 1999 1:15:00:000AM")
1999-02-04T01:15:00
julia> todate("Feb 4 1999 12:15:00:000PM")
1999-02-04T12:15:00
julia> todate("Feb 4 1999 1:15:00:000PM")
1999-02-04T13:15:00
1 Like