Error when combining single row with multiple row CSV file into DataFrames

phantom · March 15, 2024, 1:37pm

Hi! I am combining several .csv files into DataFrames. Each file has uniform column numbers and data types. Everything works but when I try to combine a .csv file that contains a single row with one that contains multiple rows I get the following error.

CSV.read(["singlerowfilepath","multiplerowfilepath"], DataFrame)

returns

ERROR: UndefVarError: `A` not defined
Stacktrace:
 [1] (::CSV.var"#3#4")(x::PooledArrays.PooledVector{String31, UInt32, Vector{UInt32}})
   @ CSV ./none:0
 [2] iterate
   @ ./generator.jl:47 [inlined]
 [3] collect(itr::Base.Generator{Vector{PooledArrays.PooledVector{String31, UInt32, Vector{UInt32}}}, CSV.var"#3#4"})
   @ Base ./array.jl:834
 [4] chaincolumns!(a::Any, b::Any)
   @ CSV ~/.julia/packages/CSV/tmZyn/src/utils.jl:240
 [5] CSV.File(sources::Vector{String}; source::Nothing, kw::@Kwargs{})
   @ CSV ~/.julia/packages/CSV/tmZyn/src/file.jl:930
 [6] File
   @ ~/.julia/packages/CSV/tmZyn/src/file.jl:901 [inlined]
 [7] read(source::Vector{String}, sink::Type; copycols::Bool, kwargs::@Kwargs{})
   @ CSV ~/.julia/packages/CSV/tmZyn/src/CSV.jl:117
 [8] read(source::Vector{String}, sink::Type)
   @ CSV ~/.julia/packages/CSV/tmZyn/src/CSV.jl:113
 [9] top-level scope
   @ REPL[134]:1

and

DataFrame!(CSV.File(["singlerowfilepath","multiplerowfilepath"]))

or 

DataFrame(CSV.File(["singlerowfilepath","multiplerowfilepath"]))

each return

ERROR: UndefVarError: `A` not defined
Stacktrace:
 [1] (::CSV.var"#3#4")(x::PooledArrays.PooledVector{String31, UInt32, Vector{UInt32}})
   @ CSV ./none:0
 [2] iterate
   @ ./generator.jl:47 [inlined]
 [3] collect(itr::Base.Generator{Vector{PooledArrays.PooledVector{String31, UInt32, Vector{UInt32}}}, CSV.var"#3#4"})
   @ Base ./array.jl:834
 [4] chaincolumns!(a::Any, b::Any)
   @ CSV ~/.julia/packages/CSV/tmZyn/src/utils.jl:240
 [5] CSV.File(sources::Vector{String}; source::Nothing, kw::@Kwargs{})
   @ CSV ~/.julia/packages/CSV/tmZyn/src/file.jl:930
 [6] CSV.File(sources::Vector{String})
   @ CSV ~/.julia/packages/CSV/tmZyn/src/file.jl:901
 [7] top-level scope
   @ REPL[138]:1

Whereas reversing the order of the files i.e.

CSV.read(["multiplerowfilepath","singlerowfilepath"], DataFrame)

returns a Dataframe with uniform column number and data type.

Just wondering if anyone had any insight on why this is happening and what I might do to fix it without having to worry about the order of the files. Thanks!

nilshg · March 15, 2024, 2:02pm

This is a pretty annoying error and I think the error message should be improved (or maybe the behaviour).

You can roughly guess from here:

[1] (::CSV.var"#3#4")(x::PooledArrays.PooledVector{String31, UInt32, Vector{UInt32}})
   @ CSV ./none:0

That it comes from PooledArrays. What’s happening here is that CSV is trying to stitch together one table from the two CSVs, but if it decides to pool the values in the first one this fails if there are additional values in the second table. In your case where you have only one row there’s only one value in it and the pool can’t capture the other values, while if you go the other way around the pool will have seen the whole range of values and can accomodate the additional value.

You can turn pooling off with the pool = false kwarg.

phantom · March 15, 2024, 2:14pm

got it thanks! Is there a downside to defaulting to pool = false other than memory usage?

pdeffebach · March 15, 2024, 2:19pm

Hopefully not! PooledVector{String} should be the exact same as Vector{String} for all intents and purposes.

phantom · March 15, 2024, 2:22pm

got it, thanks!

nalimilan · March 15, 2024, 2:36pm

Could you file a bug against CSV.jl on GitHub? If you can provide toy files to reproduce the problem that would be even better.

phantom · March 15, 2024, 8:48pm

Sure thing. I submitted an issue with a dummy example here.

Topic		Replies	Views
Clarification on when order matters when reading multiple files with CSV.read? New to Julia csv	4	548	April 15, 2024
Creating an identifier column when combing multiple DataFrames with CSV.read New to Julia question , csv	2	139	April 16, 2024
Reading multiple CSVs into one DataFrame New to Julia	12	1195	February 11, 2021
DataFrames: ByRow fails in transform with PooledArrays after CSV.read Data question	6	503	August 6, 2021
Combining dataframe and csv: undefined functions in DataFrames Data question , dataframes	5	729	May 18, 2021

Error when combining single row with multiple row CSV file into DataFrames

Related topics