CSV.jl error - cannot convert an object of type WeakRefString

Any idea what how i can work around this error? This is Julia 0.6.

julia> df = CSV.read(file);

MethodError: Cannot `convert` an object of type WeakRefString{UInt8} to an object of type Missings.Missing
This may have arisen from a call to the constructor Missings.Missing(...),
since type constructors fall back to convert methods.

Stacktrace:
 [1] setindex!(::Array{Missings.Missing,1}, ::WeakRefString{UInt8}, ::Int64) at ./array.jl:583
 [2] streamto!(::DataFrames.DataFrameStream{Tuple{CategoricalArrays.CategoricalArray{Union{Missings.Missing, String},1,UInt32,String,CategoricalArrays.CategoricalString{UInt32},Missings.Missing},Array{Union{Int64, Missings.Mi

You could try CSVFiles.jl and see whether that works better.

Try df = CSV.read(file,rows_for_type_detect=100000). CSV tries to detect the column types, but falls over if it finds a later value that cannot be parsed into the type

This is a bit unexpected…

julia> @time df = DataFrame(CSVFiles.load(file))

MethodError: no method matching load(::String)
Closest candidates are:
  load(::FileIO.File{FileIO.DataFormat{:TSV}}) at /opt/julia/share/julia/site/v0.6/CSVFiles/src/CSVFiles.jl:26
  load(::FileIO.File{FileIO.DataFormat{:CSV}}) at /opt/julia/share/julia/site/v0.6/CSVFiles/src/CSVFiles.jl:22
  load(::FileIO.File{FileIO.DataFormat{:CSV}}, ::Any; args...) at /opt/julia/share/julia/site/v0.6/CSVFiles/src/CSVFiles.jl:22
  ...

load is not a CSVFiles function, it is just reexported from FileIO :slight_smile: So you want to either call FileIO.load or just do a using CSVFiles and then simply DataFrame(load(file)). At least I hope that is the bug, if not, could you send over the versions of the packages you are using?

That works! Thanks

@davidanthoff - you’re right. However, for some reasons, I am sitll unable to load the file (which is not too big, around 700 MiB). It got pegged at 100% cpu for 30 minutes and I gave up. Same thing with TextParse.jl if I use that directly.

CSV.jl does work and it took about 2.5 minutes to load the file.

What are CSVFiles’ advantages over CSV? Is CSV being migrated to CSVFiles?

I found an interesting difference between 0.6.4 and Julia 1.0

On 0.6.4:

using DataFramesMeta, CSVFiles, BenchmarkTools, DataFrames

@btime df = DataFrame(load("/home/js/db_docs/sql/projects/dhet-journals/aantal_publikasies_in_joernaal.csv"))
 18.232 ms (215666 allocations: 16.92 MiB)
13222Ă—7 DataFrames.DataFrame. Omitted printing of 6 columns

julia> using CSV

julia> @btime df = DataFrame(CSV.read("/home/js/db_docs/sql/projects/dhet-journals/aantal_publikasies_in_joernaal.csv"))

 43.821 ms (861124 allocations: 22.19 MiB)
13222Ă—7 DataFrame. Omitted printing of 6 columns

On 1.0

using DataFramesMeta, CSVFiles, BenchmarkTools, DataFrames

 @btime df = DataFrame(load("/home/js/db_docs/sql/projects/dhet-journals/aantal_publikasies_in_joernaal.csv"))
  1.144 s (9190647 allocations: 208.81 MiB)
13222Ă—7 DataFrame. Omitted printing of 6 columns

julia> using CSV

julia> @btime df = DataFrame(CSV.read("/home/js/db_docs/sql/projects/dhet-journals/aantal_publikasies_in_joernaal.csv"))
julia> using CSV

julia> @btime df = DataFrame(CSV.read("/home/js/db_docs/sql/projects/dhet-journals/aantal_publikasies_in_joernaal.csv"))

17.901 ms (354395 allocations: 9.85 MiB)
13222Ă—7 DataFrame. Omitted printing of 6 columns

  

The 1.0 version is 61 times slower and makes nearly 43 times more allocations with CSVFiles and a little bit faster using CSV.

That seems to match up with what I just experienced. I switched over to 0.7 when I ran DataFrame(load(file)) earlier and it never came back.

CSVFiles.jl is not yet ready on julia 1.0 (I only recommended it because the original question mentioned julia 0.6). It runs, but really slowly (as you discovered). There is a nasty performance problem that crept into TextParse.jl when it was ported to julia 1.0, see here. I have no doubt we will be able to fix that, but right now that is there.

@tk3369: did things also not work on julia 0.6, or did you only try on julia 0.7?

Does not work in 0.6 either. It’s been half an hour and still hasn’t finished.
Also, I have to using FileIO or else the load function isn’t defined.

My config:

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

julia> Pkg.status("CSVFiles")
 - CSVFiles                      0.5.1

julia> Pkg.status("TextParse")
 - TextParse                     0.4.1

Hm, that is a really old version of CSVFiles.jl that you are getting there… The exported load and save functions were added in v0.6.0…

No worries… I’ve already moved on to 1.0 and have been playing with the master of that ecosystem :slight_smile:

1 Like