I use DataFrames to read a CSV file. I need to determine the number of columns with the header dates (e.g. 01/10/2020) and extract those column names and convert their type to Date.
In my MWE, I have shown two methods that I can use to select the relevant columns, i) Not(Between) or ii) based on columns containing the ‘/’ character. The problem is that dates in the CSV file may be in different formats, such as 01.10.2020 or 01-10-2020, etc. This can cause a problem when converting the column names to Date type. 1. Is there a more elegant way of determining the number of column names that contains dates than the method used in the MWE? 2. To ensure that converting into Date type cover all different formats of dates in the input data, is there a method for this? 3. Can column names be read as Date type directly when reading a CSV file?
df = DataFrame(Symbol("ID")=>1:2,Symbol("Name")=>["Abc","Cdc"],Symbol("Value")=>[100.1,200.2],
Symbol("01/10/2020")=>[23.4,29.9],Symbol("01/11/2020")=>[58.6,90.0])
#Determine the number of columns with dates
# Method 1 use Not(Between) method
df_dates=select(df, Not(Between(:ID, :Value)))
# Method 2 select columns containing '/' character
df_dates=select(df, r"/")
#Create a vector of strings for dates
Dates_String=names(df_dates)
#Determine number of coluns in Dates_String
nColn_Dates=length(Dates_String)
#Convert into type Dates for later calculations
Dates_Type=Date.(Dates_String, Dates.DateFormat("dd/mm/yyyy"))
I am a bit confused here, even if you succeed at figuring out which column names are dates, you can’t make the name of a column a date. Only strings and symbols are allowed as column names.
My impression is that this is not easy. I don’t know of a way to automatically detect that a string is a date when you don’t provide the format. i.e. a function that works like to_date("2014-03-04") and to_date("2015/06/78")
I agree - I think you won’t be able to do it unless you can reliably narrow down the set of possible formats to something manageable and then check all formats for each column header. Something like:
julia> using Dates
julia> function trydate(x, f)
try
Date(x, f)
catch e
return nothing
end
end
trydate (generic function with 1 method)
julia> formats = [dateformat"d/m/y", dateformat"d-m-y"]
2-element Vector{DateFormat{S, Tuple{Dates.DatePart{'d'}, Dates.Delim{Char, 1}, Dates.DatePart{'m'}, Dates.Delim{Char, 1}, Dates.DatePart{'y'}}} where S}:
dateformat"d/m/y"
dateformat"d-m-y"
julia> trydate.("20/10/2020", formats)
2-element Vector{Union{Nothing, Date}}:
2020-10-20
nothing
And then apply trydate across names(df) and check whether the result includes a Date type or not.