I’m excited to make our first public announcement about Banyan Julia - a suite of packages that let you use popular Julia APIs to process massive datasets on and off the cloud (via sampling):
- BanyanDataFrames.jl for DataFrames.jl
- BanyanImages.jl for Images.jl
- BanyanONNXRunTime.jl for ONNXRunTime.jl (for PyTorch/TensorFlow models)
- BanyanHDF5.jl for HDF5.jl
-
BanyanArrays.jl for
Array
Most recently, we’ve:
- achieved comparable performance with Dask (Coiled) in a preliminary benchmark for a common data analytics task
- put together a getting started walk-through video
- developed automatic instant big data sampling to reduce data teams’ reliance on expensive and energy-intensive cloud data centers
TLDR: we’re building a platform for eco-friendly large-scale data science with familiar Julia APIs. More details are on our website - BanyanComputing.com. (PS - it’s a cloud product so if you want something on-prem then look at Dagger.jl, Distributed, or MPI.jl)
PPS - I want to thank the friendly and helpful Julia community including contributors to DataFrames.jl, Images.jl, ONNXRunTime.jl, etc. Without them, this project would not be possible.