Reading Data Is Still Too Slow

I tried to load a feather file now that seems roughly similar to the CRSP file described here: 90 million rows, a mix of Int and Float64 columns. The file is about 8.5 GB on disc.

With the current released version of DataFrames.jl, it takes about 80 seconds to load on my system (I do have a very fast system :slight_smile: ). When I use the master branch of DataFrames.j, it takes somewhere between 7-15 seconds to load. All of these numbers are for FeatherFiles.jl.

So I suspect (or hope :wink: ) that @iwelch’s numbers from above are with the released DataFrames.jl, in which case we might actually have something very competitive once we get a new DataFrames.jl release out.

Caveat is that I haven’t tried a column with missing values yet.

2 Likes