Reading a CSV file with an unconventional string for missing values using TextParse


#1

I am trying to read a CSV file with two columns (Date, Float64) using TextParse. Unfortunately missing values are coded as “-” in that file. When TextParse encounters one of those “-” one gets the following error message:

MethodError: Cannot `convert` an object of type Float64 to an object of type TextParse.StrRange
This may have arisen from a call to the constructor TextParse.StrRange(...),
since type constructors fall back to convert methods.
ERROR: LoadError: CSV parsing error at line 1539 char 12:
2012-05-01,-
___________^
column 2 is expected to be: TextParse.Field{Float64,TextParse.Numeric{Float64}}(<Float64>, true, true, true)

I have been unable to add “-” to the set of strings that are to be considered NA by TextParse. Which would be the best approach to solve this problem?

Thank you for your help!


#2

I am not an expert here, but I would first read the file and then do the missing substitution. this is because it would be difficult to determine where your unusual character sits, and whether it is inside a quote or not. csv is a nasty format!

/iaw


#3

Looks like the same bug as https://github.com/JuliaData/CSV.jl/issues/198.


#4

JuliaDB.loadtable("myfile.csv", colparsers = [Date], nastrings=["-"]) works

#=
MWE
Input: myfile.csv
2018-01-04,1.4389
2018-01-01,-
2017-12-31,-1.4406

Output:
Table with 3 rows, 2 columns:
1           2
───────────────────
2018-01-04,1.4389
2018-01-01,#NA
2017-12-31,-1.4406
=#

but couldn’t find an equivalent argument to - nastrings - for TextParse (also get this CSV file from a remote URL)


#5

Got it for TextParse:

TextParse.csvread("myfile.csv", colparsers = [Date, TextParse.NAToken(TextParse.Numeric(Float64), nastrings = ["-"])])