Banyan Julia - large-scale Julia data frames, images, arrays, ML models, and more

I’m excited to make our first public announcement about Banyan Julia - a suite of packages that let you use popular Julia APIs to process massive datasets on and off the cloud (via sampling):

Most recently, we’ve:

  1. achieved comparable performance with Dask (Coiled) in a preliminary benchmark for a common data analytics task
  2. put together a getting started walk-through video
  3. developed automatic instant big data sampling to reduce data teams’ reliance on expensive and energy-intensive cloud data centers

TLDR: we’re building a platform for eco-friendly large-scale data science with familiar Julia APIs. More details are on our website - BanyanComputing.com. (PS - it’s a cloud product so if you want something on-prem then look at Dagger.jl, Distributed, or MPI.jl)

PPS - I want to thank the friendly and helpful Julia community including contributors to DataFrames.jl, Images.jl, ONNXRunTime.jl, etc. Without them, this project would not be possible. :heart:

7 Likes