Problem with using CSV package when set `limit` option

I get ArgumentError: Reference array points beyond the end of the pool when try to sort a data frame, now I know the problem is because I am using CSV.jl, but ideally this shouldn’t happen

A minimal example to show the problem

using DataFrames
using CSV
df = DataFrame(x = repeat(["a", "b"], 10^6), y = rand(2*10^6));
f1 = tempname();
CSV.write(f1, df);
df2 = CSV.read(f1, DataFrame, header = false, skipto = 1001, limit = 10000);
sort(df2, 1)
ERROR: ArgumentError: Reference array points beyond the end of the pool
[...]

I can’t replicate the problem. It works on my machine. Can you try the same code on 1.7 and the latest versions of both CSV/jl and DataFrames.jl? Maybe the problem has been fixed in more recent versions.

reproducing the example may be tricky since the problem is because of using multiple core to read data, maybe need playing around with data and skipto and limit?

CSV.jl version is v0.9.11 and it cannot updated more than this (conflict with many other packages) and DataFrames is 1.3.2

Can you try in a new environment? And without multiple cores?

This is for debugging purposes to isolate the problem.

This is a bug in SentinelArrays.ChainedVector use in CSV.jl. I have opened https://github.com/JuliaData/CSV.jl/issues/963.

1 Like