Loading a csvfile using CSVFiles

Peter_Staab · November 8, 2018, 2:34pm

I have a file with a date in the form mm/dd/yy and my guess is that it is interpreting it in dd/mm/yy format. I’ve been using CSVFiles to do this with the load command and have tried the following:

load("test.csv",colparsers=[5=>dateformat"mm/dd/yy"])

since it is the 5th column. I’ve also tried the column name as well am getting the following error:

MethodError: no method matching tofield(::Pair{Symbol,DateFormat{Symbol("mm/dd/yy"),
Tuple{Dates.DatePart{'m'},Dates.Delim{Char,1},Dates.DatePart{'d'},
Dates.Delim{Char,1},Dates.DatePart{'y'}}}}, ::TextParse.LocalOpts)

so obviously I’m not calling this correctly, but I can’t find documentation on this, nor am able to figure it out from the source code.

Here’s the first lines of that file:

Athlete,Age,Country,Year,Closing Ceremony Date,Sport,Gold Medals,Silver Medals,Bronze Medals,Total Medals
Michael Phelps,23,United States,2008,8/24/08,Swimming,8,0,0,8
Michael Phelps,19,United States,2004,8/29/04,Swimming,6,0,2,8
Michael Phelps,27,United States,2012,8/12/12,Swimming,4,2,0,6
Natalie Coughlin,25,United States,2008,8/24/08,Swimming,1,2,3,6

quinnj · November 8, 2018, 4:22pm

I know CSVFiles/TextParse.jl have been doing some refactoring lately, so probably just a missing method somewhere. In the mean time, you could use CSV.jl like:

julia> df = CSV.read("/Users/jacobquinn/Downloads/discourse.csv", dateformat="mm/dd/yy", transforms=Dict("Closing Ceremony Date"=>x->x + Dates.Year(2000)))
4×10 DataFrames.DataFrame. Omitted printing of 1 columns
│ Row │ Athlete          │ Age   │ Country       │ Year  │ Closing Ceremony Date │ Sport    │ Gold Medals │ Silver Medals │ Bronze Medals │
│     │ String           │ Int64 │ String        │ Int64 │ Date                  │ String   │ Int64       │ Int64         │ Int64         │
├─────┼──────────────────┼───────┼───────────────┼───────┼───────────────────────┼──────────┼─────────────┼───────────────┼───────────────┤
│ 1   │ Michael Phelps   │ 23    │ United States │ 2008  │ 2008-08-24            │ Swimming │ 8           │ 0             │ 0             │
│ 2   │ Michael Phelps   │ 19    │ United States │ 2004  │ 2004-08-29            │ Swimming │ 6           │ 0             │ 2             │
│ 3   │ Michael Phelps   │ 27    │ United States │ 2012  │ 2012-08-12            │ Swimming │ 4           │ 2             │ 0             │
│ 4   │ Natalie Coughlin │ 25    │ United States │ 2008  │ 2008-08-24            │ Swimming │ 1           │ 2             │ 3             │

davidanthoff · November 12, 2018, 12:38am

You need to pass a Dict to colparsers, not a vector, in this case:

load("test.csv",colparsers=Dict(5=>dateformat"mm/dd/yy"))

You would use a vector when you want to specify just all column parsers, not as Pairs, but just as a list of parsers.

davidanthoff · November 12, 2018, 1:32am

I just realized that this will still not give you what you want: it will parse the years as the year 0008 etc… That seems a weird default (to me) in the julia base date parsing story… The python/posix approach to parsing two digit year numbers seems much more useful to me. It also seems to me that it would be super helpful if one could specify in the dateformat string how two digit years should be handled (given that I would guess that the current behavior is almost never very useful).

Topic		Replies	Views
Csv dateformat New to Julia csv	2	1477	September 30, 2020
Parsing date column when reading in CSV General Usage dates , dataframes , csv	6	1074	March 7, 2023
Read CSV, converting column containing String "Date" to a specific proper Date format New to Julia	1	337	March 22, 2021
Parsing a CSV file with 2-digit year New to Julia	3	1706	September 13, 2018
Reading Date.jl New to Julia	2	1529	February 18, 2019

Loading a csvfile using CSVFiles

Related topics