CSV.jl error - cannot convert an object of type WeakRefString


#1

Any idea what how i can work around this error? This is Julia 0.6.

julia> df = CSV.read(file);

MethodError: Cannot `convert` an object of type WeakRefString{UInt8} to an object of type Missings.Missing
This may have arisen from a call to the constructor Missings.Missing(...),
since type constructors fall back to convert methods.

Stacktrace:
 [1] setindex!(::Array{Missings.Missing,1}, ::WeakRefString{UInt8}, ::Int64) at ./array.jl:583
 [2] streamto!(::DataFrames.DataFrameStream{Tuple{CategoricalArrays.CategoricalArray{Union{Missings.Missing, String},1,UInt32,String,CategoricalArrays.CategoricalString{UInt32},Missings.Missing},Array{Union{Int64, Missings.Mi

#2

You could try CSVFiles.jl and see whether that works better.


#3

Try df = CSV.read(file,rows_for_type_detect=100000). CSV tries to detect the column types, but falls over if it finds a later value that cannot be parsed into the type


#4

This is a bit unexpected…

julia> @time df = DataFrame(CSVFiles.load(file))

MethodError: no method matching load(::String)
Closest candidates are:
  load(::FileIO.File{FileIO.DataFormat{:TSV}}) at /opt/julia/share/julia/site/v0.6/CSVFiles/src/CSVFiles.jl:26
  load(::FileIO.File{FileIO.DataFormat{:CSV}}) at /opt/julia/share/julia/site/v0.6/CSVFiles/src/CSVFiles.jl:22
  load(::FileIO.File{FileIO.DataFormat{:CSV}}, ::Any; args...) at /opt/julia/share/julia/site/v0.6/CSVFiles/src/CSVFiles.jl:22
  ...

#5

load is not a CSVFiles function, it is just reexported from FileIO :slight_smile: So you want to either call FileIO.load or just do a using CSVFiles and then simply DataFrame(load(file)). At least I hope that is the bug, if not, could you send over the versions of the packages you are using?


#6

That works! Thanks


#7

@davidanthoff - you’re right. However, for some reasons, I am sitll unable to load the file (which is not too big, around 700 MiB). It got pegged at 100% cpu for 30 minutes and I gave up. Same thing with TextParse.jl if I use that directly.

CSV.jl does work and it took about 2.5 minutes to load the file.


#8

What are CSVFiles’ advantages over CSV? Is CSV being migrated to CSVFiles?


#9

I found an interesting difference between 0.6.4 and Julia 1.0

On 0.6.4:

using DataFramesMeta, CSVFiles, BenchmarkTools, DataFrames

@btime df = DataFrame(load("/home/js/db_docs/sql/projects/dhet-journals/aantal_publikasies_in_joernaal.csv"))
 18.232 ms (215666 allocations: 16.92 MiB)
13222Ă—7 DataFrames.DataFrame. Omitted printing of 6 columns

julia> using CSV

julia> @btime df = DataFrame(CSV.read("/home/js/db_docs/sql/projects/dhet-journals/aantal_publikasies_in_joernaal.csv"))

 43.821 ms (861124 allocations: 22.19 MiB)
13222Ă—7 DataFrame. Omitted printing of 6 columns

On 1.0

using DataFramesMeta, CSVFiles, BenchmarkTools, DataFrames

 @btime df = DataFrame(load("/home/js/db_docs/sql/projects/dhet-journals/aantal_publikasies_in_joernaal.csv"))
  1.144 s (9190647 allocations: 208.81 MiB)
13222Ă—7 DataFrame. Omitted printing of 6 columns

julia> using CSV

julia> @btime df = DataFrame(CSV.read("/home/js/db_docs/sql/projects/dhet-journals/aantal_publikasies_in_joernaal.csv"))
julia> using CSV

julia> @btime df = DataFrame(CSV.read("/home/js/db_docs/sql/projects/dhet-journals/aantal_publikasies_in_joernaal.csv"))

17.901 ms (354395 allocations: 9.85 MiB)
13222Ă—7 DataFrame. Omitted printing of 6 columns

  

The 1.0 version is 61 times slower and makes nearly 43 times more allocations with CSVFiles and a little bit faster using CSV.


#10

That seems to match up with what I just experienced. I switched over to 0.7 when I ran DataFrame(load(file)) earlier and it never came back.


#11

CSVFiles.jl is not yet ready on julia 1.0 (I only recommended it because the original question mentioned julia 0.6). It runs, but really slowly (as you discovered). There is a nasty performance problem that crept into TextParse.jl when it was ported to julia 1.0, see here. I have no doubt we will be able to fix that, but right now that is there.

@tk3369: did things also not work on julia 0.6, or did you only try on julia 0.7?


#12

Does not work in 0.6 either. It’s been half an hour and still hasn’t finished.
Also, I have to using FileIO or else the load function isn’t defined.

My config:

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

julia> Pkg.status("CSVFiles")
 - CSVFiles                      0.5.1

julia> Pkg.status("TextParse")
 - TextParse                     0.4.1


#13

Hm, that is a really old version of CSVFiles.jl that you are getting there… The exported load and save functions were added in v0.6.0…


#14

No worries… I’ve already moved on to 1.0 and have been playing with the master of that ecosystem :slight_smile: