Converting string to DateTime in DataFrame

I’m reading in a json file, putting it into DataFrames to clean it up, before dumping into MongoDB (via mongoc). It is almost there, my only problem is that one of the fields is a timestamp, which is coming in as a string. The strings are of the form: “2019-11-18T13:09:31Z”. The problem is that DateTime doesn’t like that Zulu timezone, and Timezones.jl doesn’t help. So I am now trying to strip off the last character (since in practice they are all recorded in UTC). the files actually contain JSON in just the last line, I strip out the HTTP headers in the first few lines.

I am using DrWatson, JSONTables, DataFrames, Dates. I have a function

function read_jsondump(filename)
cap_file = readlines(datadir(“import”, filename))
df = DataFrame(jsontable(cap_file[end]))
transform(df, :time => chop.(:time))
df[!, :time] = convert.(DateTime, df[:, :time])
return df
end

I’m still getting confused in the use of broadcasting is places like this. the transform line doesn’t work, doesn’t like chop. For reference, in pandas what I do here is
df[‘timestamp’] = pd.to_datetime(df[‘time’])
which actually creates a new field and I then delete the original, but that doesn’t matter. The point is that pandas to_datetime can handle the ‘Z’, but Julia DateTime can’t, so I have to work around it.

I am working my way through the Introduction to Dataframes tutorial, and have also just bought Tom Kwong’s book, so hopefully my confusion about handling things like this will disappear soon.

Read the docs for transform more closely. transform works like

transform(df, source => fun => dest)

where fun is a function. So you you want

transform(df, :time => (t -> chop.(t)) => :time)

OR, you can use ByRow

transform(df, :time => ByRow(chop) => :time)

OR you can use DataFramesMeta to get the syntax I think you are expecting

@transform(df, time = chop.(:time))

if you’re sure data will always have this format, you can specify the dateformat:

julia> using Dates

julia> df = dateformat"y-m-dTH:M:SZ"
dateformat"y-m-dTH:M:SZ"

julia> DateTime("2019-11-18T13:09:31Z", df)
2019-11-18T13:09:31

or better, to accept timezones,

julia> using TimeZones

julia> ZonedDateTime("2019-11-18T13:09:31Z", dateformat"yyyy-mm-dd\THH:MM:SSz")
2019-11-18T13:09:31+00:00

Although I think this dateformat should be recognized by default since it’s so common

I see, thanks. Yes, I tried each of those, and after discovering that it doesn’t do it in place, it does chop the string, but it returns a SubString rather than a String, and I find that you can’t convert SubStrings to DateTimes either, so this approach has got me no nearer to converting those strings to DateTimes. I’ll dig some more. Thanks.

You can always call string on the result.

@transform(df, time = string.(chomp.(:time)))

But it seems to work for me on 1.5.0

julia> DateTime("1996-01-01")
1996-01-01T00:00:00

julia> DateTime(chomp("1996-01-01"))
1996-01-01T00:00:00

got it at last!

function read_jsondump(filename)
    cap_file  = readlines(datadir("import", filename))
    df = DataFrame(jsontable(cap_file[end]))
    df = @transform(df, time = string.(chop.(:time)))
    @transform(df, time = DateTime.(:time))
end

Phew! Thanks for all your help!