I’ve just installed Julia on Win-64 and I try to read a CSV file.
The following code works:
using CSVFiles, FileIO, DataFrames
df = load("C:\\...\\file.csv") |> DataFrame
But I wonder what is missing in this code:
ArgumentError: provide a valid sink argument, like `CSV.read(source, DataFrame)`
The error gives you the answer! just change the code to
Oh, cool. You have to
add DataFrames as well, because without it I receive another error.
UndefVarError: DataFrame not defined
So how do I define a data frame?
I’ve done my research and read:
No where I find any guidance how to define any DataFrame, whether it’s a variable or what.
But then I’ve tried the following code, which finally works.
Though I have no idea why.
What is the relation between DataFrames package and DataFrame? Julia seems to have a less steep learning curve than I have expected, but that’s good since I plan to write tutorials about it, once I understand.
DataFraames.jl defines the
In your code you can assign an instantiated
DataFrame object like
df = CSV.read("C:\\...\\benchmark.csv", DataFrame)
Note that in the above line, the
DataFrame is not an instantiated data frame. Rather, it’s the
Types come with constructors, so the code above is just shorthand for
df = DataFame(CSV.File("..."))
where the output of
CSV.FIle is something the
DataFrame type has defined a constructor for.
Unfortunately, the second link below is an outdated tutorial. This is a bit of a pain, but as the ecosystem stabilizes over time hopefully it will be harder to stumble across outdated tutorials.
Thank you very much @pdeffebach for the detailed explanation. Now it makes sense to me.
I’m aware that there’s still some evolution of Julia, and I’ll try to comment on the outdated data sources to help others avoid the issues I have because all the tutorials were showing the same approach which no longer works.
Maybe the error should be a bit more precise:
ArgumentError: provide a valid sink argument, like `using DataFrames; CSV.read(source, DataFrame)`
@quinnj What do you think?
Sure thing; can someone make a PR?
Hi. Is there any reason why the behavior changed?
df = DataFrame(CSV.read(data_file)) >> ERROR: ArgumentError: provide a valid sink argument, like
df = CSV.read(data_file,DataFrame) >> 295×2 DataFrame
DataFrame one should be
DataFrame(CSV.File(data_file)) and still works.
CSV.read used to work with a dataframe output as default, but was deprecated so CSV.jl doesn’t have to depend on DataFrames.jl
It was then brought back as
CSV.read(filepath, sink) as users were expecting a CSV.read function, and this structure means DataFrames only has to be loaded by the user if a DataFrame output is actually required.
@nilshg Your explanation is much appreciated, thx.
I’ve summarized what I have learned here into an article here - Read CSV to Data Frame in Julia. It describes three syntactic ways, talks about encoding and shows examples of the several of the parameters. Thanks evreyone for help.