Converting columns in a DataFrame from String to Date (with a specific format)

askvorts · February 18, 2022, 1:36pm

I am trying to convert a DataFrame column populated with dates whose type is String and format DD/MM/YYYY to dates with Date type and format YYYYMMDD

Tried A[!,:DATE] = parse.(Date, A[!,:DATE])

But it errors, with: ArgumentError: Unable to parse date time. Expected directive Delim(-)

Which would be the correct way to achieve this conversion? Thank you!

oheil · February 18, 2022, 1:44pm

Parsing into a Date is done by:

julia> d1="02/02/2022"
"02/02/2022"

julia> date=Date(d1,"dd/mm/yyyy")
2022-02-02

julia> typeof(date)
Date

Output (as String) is done by:

julia> s=Dates.format(date,"Ymmdd")
"20220202"

So:

A[!,:DATE]= Dates.format.(Date.( A[!,:DATE],"dd/mm/yyyy"),"Ymmdd")

Typically, when it’s about DataFrames, after a while, much more elegant versions are popping up, just wait for it…

askvorts · February 18, 2022, 2:59pm

Thank you very much!

tk3369 · February 20, 2022, 4:59pm

How did you get date strings into a data frame in the first place?

In case that you use CSV.jl, you can specify a date format when reading the file Reading · CSV.jl

DataFrames · February 21, 2022, 12:58am

Don’t call Dates.format on your column, because it converts Dates to String so in general it is a bad practice. Leave it as Date unless you want to present data.

using Dates
f(x; df = dateformat"m/d/y") = Date(x, df)
transform(df, :date=>ByRow(f))

StuartRL · October 10, 2022, 12:38pm

…apologies but how does this work for a column of times i.e. converting a dataframe column typically “01:10:46” string to 01:10:46 Time. I blow up with either ‘no method matching’ or ‘no method matching Int64(::Vector{Any})’ etc etc. Dataframe column is loaded with 700 rows of Any.

nilshg · October 10, 2022, 12:49pm

The first thing to realise is that converting DataFrame columns is not different from converting regular Arrays of any type to any other type, as DataFrame columns are just vectors. So you really just need to know how to convert a String to a Time object, irrespective of where this String is stored.

The second thing to note is that getting a numerical value out of a string is generally referred to as “parsing”, rather than “conversion”. With this, you have:

julia> using Dates

julia> parse(Time, "01:10:46")
01:10:46

julia> typeof(ans)
Time

and then broadcast that over your data, i.e. parse.(Time, df.timecol) (although I can’t guarantee that all your strings have the right format to be parsed, so you might have to fiddle with it a bit!)

PS how did you end up with 700 columns of type Any? If you are using XLSX.jl to read an Excel file, consider the infer_eltypes = true kwarg.

StuartRL · October 10, 2022, 1:15pm

thank you very much for the reply. I have a df column with 700 rows of string time in a df.Time column which has come from an xslx but I had missed the infer_eltypes option.

parse(Time, df.“Time”) gives methodError

I’m trying to re-write the entire df.Time column. In the xlsx file the times are in time format and add up.

new to Julia and coming from a Python background. Again thanks for the comment, I’ll keep at it.

nilshg · October 10, 2022, 1:27pm

My suggestion was parse.(Time, df."Time") - note the dot after parse to broadcast (apply element-wise) the function.

(also note the quotes aren’t necessary if you have a column name which is a single word without special characters)

StuartRL · October 10, 2022, 1:39pm

Tried the broadcast (sorry) my mis-typing.

julia> parse.(Time, df.Time)
ERROR: MethodError: no method matching parse(::Type{Time}, ::Time)

pdeffebach · October 10, 2022, 1:42pm

You have already converted the df.Time column to a Time type. It is no longer a string, so parse doesn’t work. Try again on your “raw” data frame with string types.

StuartRL · October 10, 2022, 1:49pm

Good spot the df.Time column got corrected with the great infer_eltype suggestion. Other similar columns in the df remained Any and a parse.(Time, df.“Avg Pace”) gives…

pdeffebach · October 10, 2022, 1:51pm

The problem is the same, as before. Your column isn’t all Strings and parse only works on strings.

How are you importing your data? It’s not a great sign to have these Any columns.

StuartRL · October 10, 2022, 2:11pm

Interesting, perhaps I go back a raw CSV.File input, which I moved away from to

DataFrame(XLSX.readtable(“file.xlsx”, “Activities”, infer_eltypes = true))

with the useful suggestion of infer_eltypes that worked on the first Time column leaving the remaining others.

Think I’ll take a few steps backwards. Didn’t want to take up too much of people’s time etc. From the comments I think its the raw data more than my logic which is a step. Thanks.

pdeffebach · October 10, 2022, 2:15pm

Don’t worry! These are annoying problems to run into.

One solution would be a helper function

get_to_time(x::String) = parse(Time, x)
get_to_time(x::Time) = x
get_to_time(x) = missing
get_to_time.(df.Time)

Topic		Replies	Views
Converting DateTime in DataFrames indo Data New to Julia question , dates , dataframes	9	561	April 15, 2021
Convert date format "dd_mm_yyyy" in dataframe to "dd/mm/yyyy" General Usage dates , dataframes	6	2267	February 10, 2022
Most efficient way: create/mutate DataFrame time column General Usage	8	1081	November 28, 2019
Parsing date column when reading in CSV General Usage dates , dataframes , csv	6	1076	March 7, 2023
DataFrame conversion of column names to type Data dates , data , type , dataframes	5	1142	November 25, 2020

Converting columns in a DataFrame from String to Date (with a specific format)

Related topics