CSV, DataFrames problems with threads

Today I encountered an odd problem:

using CSV, DataFrames
dt = DataFrame(a = rand(10000), b = rand(10000), c = rand(10000))
CSV.write("test.csv", dt)

df = CSV.File("test.csv") |> DataFrame!
df[(ismissing.(df.a)) .| (ismissing.(df.b)) .| (ismissing.(df.c)), :]

The code above works with 1 thread but with 4 threads (on Julia 1.3) it gives

MethodError: no method matching LazyArrays.LazyArrayStyle()

I thought that I share my finding here, as I don’t know whether this is a problem with CSV, DataFrames or threading.
The used versions are:

  • OS: Win 10
  • Julia 1.3-rc{4,5}
  • CSV v0.5.18
  • DataFrames v0.19.4
1 Like

xref: https://github.com/JuliaData/CSV.jl/issues/536

The offending line in CSV is https://github.com/JuliaData/CSV.jl/blob/b85a3ce8fcb1d0c9611b07d591f58985d6e9ea42/src/CSV.jl#L470 (use of LazyArrays.ApplyArray if threaded) and I guess it’s an issue with LazyArrays (which bit me yesterday but in a different context). So maybe it’s worth opening an issue with LazyArrays?