Benchmarking ways to write/load DataFrames IndexedTables to disk

xiaodai · February 19, 2018, 11:54am

Updated with @zhangliye’s code. R’s feather implementation is quite a bit faster than Julia’s. This can probably be improved. Actually data.table’s fwrite is actually very very fast and is competitive with fst.

@davidanthoff Looks like CSV.jl has a reasonably fast reader, on par with Pandas and data.table, in this case which is reading in 1m rows with 9 columns of mixed string, float and integer types. I am interested to test this out on a largish real-world dataset e.g. Fannie Mae to see how it stacks up, last time I tried it it didn’t compare so favourably.

Topic		Replies	Views
CSV Reader / Writer Choices Data	1	735	August 28, 2018
Reading Data Is Still Too Slow Data	35	8818	August 2, 2019
My experiences reading CSVs from the Fannie Mae datasets Data performance , csv	62	6143	August 26, 2019
Benchmarks of Various Formats for Tabular Data Data benchmark , input-output	1	800	November 21, 2020
The state of DataFrames.jl H2O benchmark Package Announcements dataframes	53	9370	January 1, 2025

Benchmarking ways to write/load DataFrames IndexedTables to disk

Related topics