No where I find any guidance how to define any DataFrame, whether it’s a variable or what.
But then I’ve tried the following code, which finally works.
using CSV
using DataFrames
CSV.read("C:\\...\\benchmark.csv", DataFrame)
Though I have no idea why. What is the relation between DataFrames package and DataFrame? Julia seems to have a less steep learning curve than I have expected, but that’s good since I plan to write tutorials about it, once I understand.
Note that in the above line, the DataFrame is not an instantiated data frame. Rather, it’s the TypeDataFrame.
In julia, Types come with constructors, so the code above is just shorthand for
df = DataFame(CSV.File("..."))
where the output of CSV.FIle is something the DataFrame type has defined a constructor for.
Unfortunately, the second link below is an outdated tutorial. This is a bit of a pain, but as the ecosystem stabilizes over time hopefully it will be harder to stumble across outdated tutorials.
Thank you very much @pdeffebach for the detailed explanation. Now it makes sense to me.
I’m aware that there’s still some evolution of Julia, and I’ll try to comment on the outdated data sources to help others avoid the issues I have because all the tutorials were showing the same approach which no longer works.
The DataFrame one should be DataFrame(CSV.File(data_file)) and still works. CSV.read used to work with a dataframe output as default, but was deprecated so CSV.jl doesn’t have to depend on DataFrames.jl
It was then brought back as CSV.read(filepath, sink) as users were expecting a CSV.read function, and this structure means DataFrames only has to be loaded by the user if a DataFrame output is actually required.
I’ve summarized what I have learned here into an article here - Read CSV to Data Frame in Julia. It describes three syntactic ways, talks about encoding and shows examples of the several of the parameters. Thanks evreyone for help.