mbah
March 6, 2023, 11:34pm
1
Hello all,
I have a CSV file with a date column that I am trying to parse when reading it in. I have tried several options but they don’t work.
df = DataFrame(CSV.File(joinpath(data_path, "PJM_2018.csv")))
8751×8 DataFrame
Row │ time_stamp Coal Nuclear Gas Oil Wind Hydro Solar
│ String31 Float64 Float64 Float64 Float64 Float64 Float64 Float64
──────┼─────────────────────────────────────────────────────────────────────────────────
1 │ 01.01.2018 00:00 48.55 35.53 23.38 1.54 3.91 0.7 0.0
2 │ 01.01.2018 01:00 48.06 35.55 22.77 1.56 3.41 1.04 0.0
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
8751 │ 12/31/2018 14:00 24.49 34.76 29.05 0.22 3.84 2.14 0.0456
The data the following format
"12/31/2018 14:00"
Any suggestions on how to parse the time_stamp coloumn?
Thanks
nilshg
March 7, 2023, 6:17am
2
Have you tried to use the dateformat
kwarg as specified in the docs?
https://csv.juliadata.org/stable/reading.html#dateformat
Based on the sample data you posted above though this will fail because your time_stamp
column has multiple different date formats. You’ll have to manually parse this after reading in the CSV.
1 Like
mbah
March 7, 2023, 9:18am
3
Thanks for pointing out the different date formats. I formatted them to a consistent format (mm/dd/yyyy) and parsed them using the following code:
df.time_stamp = DateTime.(df.time_stamp, "mm/dd/yyyy HH:MM")
Thanks
1 Like
I don’t know if the following is useful to the problem nor, even less, if it is efficient, but it was just to do an exercise on what is illustrated here
data = """
col1;col2;col3;col4;col5
"05.02.2023";1000,01;2000,02;3000,03;12:00:00
"06.02.2023";4000,04;5000,05;6000,06;12:00:00
"06.02.2023";4000,04;5000,05;6000,06;12:00:00
"02/06/2023";4000,05;5000,05;6000,06;12:00:00
"02/06/2023";4000,06;5000,05;6000,06;12:00:00
"02/06/2023";4000,07;5000,05;6000,06;12:00:00
"""
tbl = CSV.File(IOBuffer(data); delim=';') |> columntable
tbl.col1
replace.(tbl.col1, r"(\d\d/)(?<ng>\d\d/)" => s"\g<ng>\1","."=>"/")
I don’t have much experience with regular expressions, but this case seems like it could be solved maybe more clearly
replace.(tbl.col1, r"(\d\d/)(\d\d/)" => s"\2\1", "."=>"/")
1 Like
using CSV, DataFrames, Dates
data = """
time_stamp,Coal,Nuclear,Gas,Oil,Wind,Hydro,Solar
01.01.2018 00:00,48.55,35.53,23.38,1.54,3.91,0.7,0.0
01.01.2018 01:00,48.06,35.55,22.77,1.56,3.41,1.04,0.0
12/31/2018 14:00,24.49,34.76,29.05,0.22,3.84,2.14,0.045
"""
df = CSV.read(IOBuffer(data), DataFrame)
df.time_stamp .= replace.(df.time_stamp, "." => "/")
df.time_stamp .= DateTime.(df.time_stamp, "mm/dd/yyyy HH:MM")
df
1 Like
mbah
March 7, 2023, 9:50pm
6
@rafael.guerra thanks for the nifty code. Really helpful!
mbah
March 7, 2023, 9:52pm
7
@rocco_sprmnt21 thanks for this