Newbie using CSV with categorical=true


#1

How do I use the categorical=true feature of CSV.validate(…)?
I’ve tried a bunch of variations, and keep getting the error “ERROR: TypeError: in setfield!, expected Union{Missing, CategoricalString{UInt32}}, got String”.

I have two test files called “data/tmp1.csv” and “data/tmp2.csv”. The contents tmp1 are the two lines:
0001000
0001000

The contents tmp2 are the two lines:
“0001000”
“0001000”

I’ve defined the following variables for use in the CSV.validate(…) call.
colNames=[:senID]
colTypes=Dict(:senID=>String)
colTypes2=Dict(:senID=>CategoricalString{UInt32})

I’ve tried all the following variations, plus others, and still get the error above. Can someone tell how to do this correctly?

CSV.validate(“data/tmp1.csv”,header=colNames, types=colTypes, categorical=true, allowmissing=:none, strict=true)
CSV.validate(“data/tmp2.csv”,header=colNames, types=colTypes, categorical=true, allowmissing=:none, strict=true)
CSV.validate(“data/tmp1.csv”,header=colNames, types=colTypes2, categorical=true, allowmissing=:none, strict=true)
CSV.validate(“data/tmp2.csv”,header=colNames, types=colTypes2, categorical=true, allowmissing=:none, strict=true)
CSV.validate(“data/tmp1.csv”,header=colNames, types=colTypes, categorical=true, allowmissing=:none, strict=true)
CSV.validate(“data/tmp1.csv”,header=colNames, types=colTypes, typemap=tmpmap, categorical=true, allowmissing=:none, strict=true)

Thanks for your attention and help!


#2

Please see the bottom half of this thread:

Ask for more help if that does not get your stuff working.


#3

I tried “add CSV#master” from pkg mode as suggested near the end of that thread, and that step succeeded, but I still get the same error.


#4

What version of Julia are you using, and is it on Linux OSX or Win?

versioninfo()

gives that and more – would you post what it reports?


#5

I’m on a Macbook Pro, OSX 10.13.6 (17G65).

Julia Version 1.0.1

Commit 0d713926f8 (2018-09-29 19:05 UTC)

Platform Info:

OS: macOS (x86_64-apple-darwin14.5.0)

CPU: Intel® Core™ i5-5257U CPU @ 2.70GHz

WORD_SIZE: 64

LIBM: libopenlibm

LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)

Environment:

JULIA_EDITOR = atom -a

JULIA_NUM_THREADS = 2


#6

It might need something like colTypes=Dict(:senID=>Union{String, Missing}), but you could also try giving the file a header line and using less params, ie CSV.validate(“data/tmp2.csv”, categorical=true)

If all else fails, you might be able to call CategoricalStrings on a readdlm output

using DelimitedFiles
testfile = IOBuffer("0001000\n0001000");
readdlm(testfile, ',', String, header=false)
> "0001000"
  "0001000"

#7

That’s just a bug, I’ve filed an issue.

As a workaround, note that categorical=true shouldn’t make any difference for validation, so you can just drop that argument. It’s mostly useful for CSV.read/CSV.File, and there it works AFAICT.


#8

Thanks!