A new 0.3 release has just been made for the Arrow.jl package.
This is a significant rewrite of the entire package from scratch, and it now lives under the JuliaData organization. With this release, Arrow.jl now fully implements the 1.0 version of the apache arrow format in native Julia. More detailed support now includes:
- All primitive data types
- All nested data types
- Dictionary encodings and messages
- Extension types
- Streaming, file, record batch, and replacement and isdelta dictionary messages
It currently doesn’t include support for:
- Tensors or sparse tensors
- Flight RPC
- C data interface
Third-party data formats:
- csv and parquet support via the existing CSV.jl and Parquet.jl packages
- Other Tables.jl-compatible packages automatically supported (DataFrames.jl, JSONTables.jl, JuliaDB.jl, SQLite.jl, MySQL.jl, JDBC.jl, ODBC.jl, XLSX.jl, etc.)
- No current Julia packages support ORC or Avro data formats
This 0.3 release is meant as a “beta” release of the new rewritten code and we invite all to give it a try and report any issues you may run into. Also feel free to post questions/issues in the #data slack channel.
The plan is to let the 0.3 help shake out any glaring issues in the rewritten code before doing an official 1.0 release. In the mean time, I’ll also be working on integrating the julia implementation into the official apache arrow repository.
For the really adventurous among you, I recorded a 90-minute video doing a deep-dive into the Arrow.jl Julia implementation of the arrow format; it dives deep into the code and also gives some high-level ideas/uses for arrow data in general.
Cheers!
-Jacob