CSV.read Error - provide a valid sink argument

I’ve just installed Julia on Win-64 and I try to read a CSV file.

The following code works:

using CSVFiles, FileIO, DataFrames

df = load("C:\\...\\file.csv") |> DataFrame

But I wonder what is missing in this code:

[In]
using CSV
CSV.read("C:\\...\\file.csv")

[Out]
ArgumentError: provide a valid sink argument, like `CSV.read(source, DataFrame)`
3 Likes

The error gives you the answer! just change the code to

CSV.read("C:\\...\\file.csv", DataFrame)
3 Likes

Oh, cool. You have to add DataFrames as well, because without it I receive another error.

[In]:
using CSV
CSV.read("C:\\...\\file.csv", DataFrame)

[Out]:
UndefVarError: DataFrame not defined

So how do I define a data frame?
I’ve done my research and read:

No where I find any guidance how to define any DataFrame, whether it’s a variable or what.

But then I’ve tried the following code, which finally works.

using CSV
using DataFrames

CSV.read("C:\\...\\benchmark.csv", DataFrame)

Though I have no idea why. What is the relation between DataFrames package and DataFrame? Julia seems to have a less steep learning curve than I have expected, but that’s good since I plan to write tutorials about it, once I understand.

The package DataFraames.jl defines the DataFrame type.

In your code you can assign an instantiated DataFrame object like

df = CSV.read("C:\\...\\benchmark.csv", DataFrame)

Note that in the above line, the DataFrame is not an instantiated data frame. Rather, it’s the Type DataFrame.

In julia, Types come with constructors, so the code above is just shorthand for

df = DataFame(CSV.File("..."))

where the output of CSV.FIle is something the DataFrame type has defined a constructor for.

Unfortunately, the second link below is an outdated tutorial. This is a bit of a pain, but as the ecosystem stabilizes over time hopefully it will be harder to stumble across outdated tutorials.

3 Likes

Thank you very much @pdeffebach for the detailed explanation. Now it makes sense to me.
I’m aware that there’s still some evolution of Julia, and I’ll try to comment on the outdated data sources to help others avoid the issues I have because all the tutorials were showing the same approach which no longer works.

Maybe the error should be a bit more precise:

ArgumentError: provide a valid sink argument, like `using DataFrames; CSV.read(source, DataFrame)`

@quinnj What do you think?

3 Likes

Sure thing; can someone make a PR?

Sure: https://github.com/JuliaData/CSV.jl/pull/775

1 Like

Hi. Is there any reason why the behavior changed?

df = DataFrame(CSV.read(data_file)) >> ERROR: ArgumentError: provide a valid sink argument, like CSV.read(source, DataFrame)
df = CSV.read(data_file,DataFrame) >> 295×2 DataFrame

The DataFrame one should be DataFrame(CSV.File(data_file)) and still works. CSV.read used to work with a dataframe output as default, but was deprecated so CSV.jl doesn’t have to depend on DataFrames.jl

It was then brought back as CSV.read(filepath, sink) as users were expecting a CSV.read function, and this structure means DataFrames only has to be loaded by the user if a DataFrame output is actually required.

1 Like

@nilshg Your explanation is much appreciated, thx.

1 Like

I’ve summarized what I have learned here into an article here - Read CSV to Data Frame in Julia. It describes three syntactic ways, talks about encoding and shows examples of the several of the parameters. Thanks evreyone for help.

2 Likes