[SOLVED] Usage of CSV.read recently deprecated?

In 1 week, I used the following lines:

using CSV: read
    In_vec = read(in_vec_file,header=false);
    In_vec = In_vec[1] #turn into usable Julia vector

They would return me an Array{Float64,1} of the size of my file. Now, after doing no changes, I get this error “ERROR: ArgumentError: provide a valid sink argument, like `using DataFrames; CSV.read(source, DataFrame)”

Did the functionality recently change? What is the easiest way to collect my ‘read’ as an Array? I know I can use DataFrames and Table.matrix to try to solve this, but I want to first understand the problem and second try to solve it without using more additional packages.

2 Likes

If it’s just a super clean CSV file with Float64 values, you can use readdlm

2 Likes

And even if the contents are not only numbers, readdlm will work too, only that it will produce an Array{Any, 2}.

Have a look at CSV.File instead. I was wondering this as well yesterday.

So odd, but I’m glad I’m not the only one who noticed this issue.

@pdeffebach I’ll take a look at readdlm, but is there a way to keep using the same CSV.read structure with minimal changes?

You should be able to do read(in_vec_file, DataFrame, header=false) to get the previous behavior back. Apparently this was changed to avoid having DataFrames as a dependency in CSV.jl.

1 Like

Yeah, you can use

CSV.read(file, NamedTuple)

then work with the vector from that named tuple. You don’t even have to have DataFrames as a dependency for this.

2 Likes

Thanks @jonathanBieler and @pdeffebach for the inputs. Using DataFrame (which should have been 1:1 with my initial solution) requires changing In_vec[1] to In_vec[!,1] which is fine, but requires “using DataFrames”.

The solution with NamedTuple gives the exact behavior I had before, where I need to do In_vec = In_vec[1] to get a usable vector.

I tried In_vec = read(in_vec_file, collect, header=false), which is a bit frustrating because it’s almost a one-liner. It outputs a 1-element Array{Any,1} similar to the NamedTuple method, requiring the second line to work.

I’m satisfied with this solution because, currently, I only have 1 data point per line. If I had multiple columns to be loaded at once (like a f(x,y) function dump), I think the NamedTuple method would give me a workable variable as well.

Only an ‘off-topic’ question remains: why doesn’t the official documentation have a simple entry on CSV.read? Is it really deprecated and not intended for future use?

1 Like

I don’t know, I noticed that yesterday. It should be fixed.