DataFrame conversion of column names to type

I use DataFrames to read a CSV file. I need to determine the number of columns with the header dates (e.g. 01/10/2020) and extract those column names and convert their type to Date.

In my MWE, I have shown two methods that I can use to select the relevant columns, i) Not(Between) or ii) based on columns containing the ‘/’ character. The problem is that dates in the CSV file may be in different formats, such as 01.10.2020 or 01-10-2020, etc. This can cause a problem when converting the column names to Date type.
1. Is there a more elegant way of determining the number of column names that contains dates than the method used in the MWE?
2. To ensure that converting into Date type cover all different formats of dates in the input data, is there a method for this?
3. Can column names be read as Date type directly when reading a CSV file?

df = DataFrame(Symbol("ID")=>1:2,Symbol("Name")=>["Abc","Cdc"],Symbol("Value")=>[100.1,200.2],
Symbol("01/10/2020")=>[23.4,29.9],Symbol("01/11/2020")=>[58.6,90.0])

#Determine the number of columns with dates 
# Method 1 use Not(Between) method
df_dates=select(df, Not(Between(:ID, :Value)))
# Method 2 select columns containing '/' character
df_dates=select(df, r"/")
#Create a vector of strings for dates
Dates_String=names(df_dates)
#Determine number of coluns in Dates_String
nColn_Dates=length(Dates_String)
#Convert into type Dates for later calculations
Dates_Type=Date.(Dates_String, Dates.DateFormat("dd/mm/yyyy"))

I am a bit confused here, even if you succeed at figuring out which column names are dates, you can’t make the name of a column a date. Only strings and symbols are allowed as column names.

1 Like

I agree. This is what I expected answer to my question 3.

The first two questions are bothering me more and it will be interesting to get views from others. That is

  • how would they go about determining number of columns that contains Dates as headers (though the type will be string),

  • selecting these columns names and

  • converting them into Date type (as they are needed for calculations later in the code)

I am not sure if I am doing it in an efficient way in the MWE as I may be making some unnecessary allocations.

My impression is that this is not easy. I don’t know of a way to automatically detect that a string is a date when you don’t provide the format. i.e. a function that works like to_date("2014-03-04") and to_date("2015/06/78")

1 Like

I agree - I think you won’t be able to do it unless you can reliably narrow down the set of possible formats to something manageable and then check all formats for each column header. Something like:

julia> using Dates

julia> function trydate(x, f) 
           try 
               Date(x, f) 
           catch e
               return nothing
           end
       end
trydate (generic function with 1 method)

julia> formats = [dateformat"d/m/y", dateformat"d-m-y"]
2-element Vector{DateFormat{S, Tuple{Dates.DatePart{'d'}, Dates.Delim{Char, 1}, Dates.DatePart{'m'}, Dates.Delim{Char, 1}, Dates.DatePart{'y'}}} where S}:
 dateformat"d/m/y"
 dateformat"d-m-y"

julia> trydate.("20/10/2020", formats)
2-element Vector{Union{Nothing, Date}}:
 2020-10-20
 nothing

And then apply trydate across names(df) and check whether the result includes a Date type or not.

1 Like

Thank you. Your suggestion looks really good!