[ANN] MichiBoost.jl — Native gradient boosting in Julia

Hi everyone,

I’d like to share a project I’ve been working on: MichiBoost.jl, a gradient boosting library written 100% in Julia.

The story starts with another project I maintain for JuliaAI, MLFlowClient.jl, where I had the idea of building a wrapper using PythonCall.jl to avoid the hassle of keeping up with REST API updates. While looking for other packages in the organization using this approach, I came across CatBoost.jl, a wrapper around the well-known Python package. Reading through the code, a question kept nagging at me: why, in a language designed for high-performance computing, do we still rely on wrappers over C++ or Python libraries when we need a GBDT? MichiBoost.jl is an attempt to answer that with a native implementation.

Features

  • Symmetric (oblivious) trees, the same structure CatBoost uses, enabling fast and predictable inference
  • Support for regression, binary classification, and multiclass classification
  • Native handling of categorical variables via ordered target encoding, no one-hot encoding required
  • Numerical feature quantization into bins to speed up tree construction
  • Built-in cross-validation and early stopping
  • SHAP values for model interpretability
  • Feature importance based on split frequency
  • Simple scikit-learn-style API: fit!, predict, predict_proba, predict_classes

The package leverages Julia’s multi-threading both during training and inference, and ships with a benchmark suite that compares it directly against CatBoost.jl (outperforming it in certain scenarios).

The project is under active development with performance as an ongoing focus. There are still several optimizations on the roadmap (histogram subtraction, better memory layout in the split-finding hot path, among others), so contributions in that area are especially welcome.

Repository: GitHub - pebeto/MichiBoost.jl: Pure Julia gradient boosting with native categorical feature handling and symmetric decision trees. · GitHub

9 Likes

I believe this is why EvoTrees.jl exists :wink: