Update
You should really be using CSV.jl because it performs quite well. I only use Fread.jl for converting data from parquet to feature etc now and not for reading CSVs as CSV.jl is actually really good.
Original content
Let’s be honest, all we care about is the speed of CSV reading. I think it was one of @jeff.bezanson’s quotes.
Just from a quick browse of the code, it seems like you use R’s fread to read the table and write it as a feather file, and then use Julia’s native Feather package to read it back in.
Just wondering if this back and forth is still faster than CSV.jl and if so, why? What makes the CSV.jl package much slower in this case?
I think one potential issue here is that Feather.read doesn’t actually load the data from disc, it just reads the meta-data, and then the data will get loaded from disc when you actually access values. Not sure how these benchmarks were run, but potentially they didn’t include the reading of the data from disc from the feather files back into memory.
Firstly reading from feather doesn’t actually read. It just maps. Secondly, fread is very mature (10 years of development), so it’s much faster. If Julia has arrow then the data can be accessed quicker without going thru the feather step.