here is a Pluto.jl notebook for benchmarking of the read and write performance, as well as file sizes of various formats for tabular data.
The following formats / packages are compared:
- CSV via https://github.com/JuliaData/CSV.jl
- JSON via https://github.com/JuliaData/JSONTables.jl
- Zipped CSV via https://github.com/fhs/ZipFile.jl
- JDF via https://github.com/xiaodaigh/JDF.jl
- Parquet via https://github.com/JuliaIO/Parquet.jl
- Apache Arrow via https://github.com/JuliaData/Arrow.jl
- Excel (xlsx) via https://github.com/felipenoris/XLSX.jl
- SQLite via https://github.com/JuliaDatabases/SQLite.jl
I always used the default configuration of each package, i.e. multithreading or compression is only used if it is switched on by default.
On my machine, Arrow is fastest, followed by JDF (note that Arrow is not compressed per default by JDF is).