I’m reading in a json file, putting it into DataFrames to clean it up, before dumping into MongoDB (via mongoc). It is almost there, my only problem is that one of the fields is a timestamp, which is coming in as a string. The strings are of the form: “2019-11-18T13:09:31Z”. The problem is that DateTime doesn’t like that Zulu timezone, and Timezones.jl doesn’t help. So I am now trying to strip off the last character (since in practice they are all recorded in UTC). the files actually contain JSON in just the last line, I strip out the HTTP headers in the first few lines.
I am using DrWatson, JSONTables, DataFrames, Dates. I have a function
I’m still getting confused in the use of broadcasting is places like this. the transform line doesn’t work, doesn’t like chop. For reference, in pandas what I do here is
df[‘timestamp’] = pd.to_datetime(df[‘time’])
which actually creates a new field and I then delete the original, but that doesn’t matter. The point is that pandas to_datetime can handle the ‘Z’, but Julia DateTime can’t, so I have to work around it.
I am working my way through the Introduction to Dataframes tutorial, and have also just bought Tom Kwong’s book, so hopefully my confusion about handling things like this will disappear soon.
I see, thanks. Yes, I tried each of those, and after discovering that it doesn’t do it in place, it does chop the string, but it returns a SubString rather than a String, and I find that you can’t convert SubStrings to DateTimes either, so this approach has got me no nearer to converting those strings to DateTimes. I’ll dig some more. Thanks.
function read_jsondump(filename)
cap_file = readlines(datadir("import", filename))
df = DataFrame(jsontable(cap_file[end]))
df = @transform(df, time = string.(chop.(:time)))
@transform(df, time = DateTime.(:time))
end