Hi,
here is a Pluto.jl notebook for benchmarking of the read and write performance, as well as file sizes of various formats for tabular data.
The following formats / packages are compared:
- CSV via GitHub - JuliaData/CSV.jl: Utility library for working with CSV and other delimited files in the Julia programming language
- JSON via GitHub - JuliaData/JSONTables.jl: JSON3.jl + Tables.jl
- Zipped CSV via GitHub - fhs/ZipFile.jl: Read/Write ZIP archives in Julia
- JDF via GitHub - xiaodaigh/JDF.jl: Julia DataFrames serialization format
- Parquet via GitHub - JuliaIO/Parquet.jl: Julia implementation of Parquet columnar file format reader
- Apache Arrow via GitHub - apache/arrow-julia: Official Julia implementation of Apache Arrow
- Excel (xlsx) via GitHub - felipenoris/XLSX.jl: Excel file reader and writer for the Julia language.
- SQLite via GitHub - JuliaDatabases/SQLite.jl: A Julia interface to the SQLite library
I always used the default configuration of each package, i.e. multithreading or compression is only used if it is switched on by default.
On my machine, Arrow is fastest, followed by JDF (note that Arrow is not compressed per default by JDF is).