I tried to load a feather file now that seems roughly similar to the CRSP file described here: 90 million rows, a mix of Int
and Float64
columns. The file is about 8.5 GB on disc.
With the current released version of DataFrames.jl, it takes about 80 seconds to load on my system (I do have a very fast system ). When I use the master
branch of DataFrames.j, it takes somewhere between 7-15 seconds to load. All of these numbers are for FeatherFiles.jl.
So I suspect (or hope ) that @iwelchβs numbers from above are with the released DataFrames.jl, in which case we might actually have something very competitive once we get a new DataFrames.jl release out.
Caveat is that I havenβt tried a column with missing values yet.