EvoTrees.jl v0.15.0

jeremiedb · June 26, 2023, 3:25pm

EvoTrees.jl just went through a significant refurbishment for v0.15.0:

https://evovest.github.io/EvoTrees.jl/stable/

Direct handling of `Tables` compatible data

It’s not possible to train directly from Tables like structure, most notably DataFrame and named tuples, which are natural ways in which tabular data presents itself:

using EvoTrees, DataFrames
config = EvoTreeRegressor()
dtrain = DataFrame(x_train, :auto)
dtrain.y .= y_train
m = fit_evotree(config, dtrain; target_name="y")
pred = m(dtrain)

Support for the original Matrix/Vector based data remains:

x_train, y_train = rand(1_000, 10), rand(1_000)
m = fit_evotree(config; x_train, y_train)
pred = m(x_train)

Handling of Categorical and Bool types

When using a Table compatible data input, features with element types Real (incl. Bool) and Categorical are automatically recognized as input features. Alternatively, fnames kwarg can be used to explicitly specify feature vars.

m = fit_evotree(config, dtrain; target_name="y", fnames=["x1", "x3"]);

Categorical features are treated accordingly by the algorithm. Ordered variables will be treated as numerical features, using ≤ split rule, while unordered variables are using ==. Support is currently limited to a maximum of 255 levels. Bool variables are treated as unordered, 2-levels cat variables.

Improved handling of devices (CPU/GPU)

GPU memory footprint has been significantly reduced thanks to a single histogram kept on GPU ram instead of 3 for every node of a tree.

Training on “cpu” or “gpu” is now controlled over the kwarg passed to fit_evotree (no longer part of the model contructor such as EvoTreeRegressor.

All GPU specific structs have been removed, common CPU based structs are used for both CPU and GPU based training (GPU specific objects are kep in cache).

Fixed numerical instabilities

EvoTree model contructors used to support the T kwarg to specify either Float32 or Float64 as the basis for computation, ex: EvoTreeRegressor(T=Float64). This has been dropped in v0.15 and instead calculations at the observation level are handled as Float32 while accumulations are done with Float64. This provides best of both world: it solves some numerical instabilities observed with Float32 on some larger datasets, while keeping performance similar to full Float32 precision.

mrufsvold · June 26, 2023, 8:25pm

This package is amazing! The docs are fabulous.

Out of curiosity – is there a particular reason it’s not been registered?

jeremiedb · June 26, 2023, 8:27pm

Thanks for the kind words!

EvoTrees.jl is indeed registered: https://github.com/JuliaRegistries/General/tree/master/E/EvoTrees.
Have you encountered some issue installing from General registry?

mrufsvold · June 26, 2023, 9:26pm

Oh, great! I didn’t try, just saw that the docs pointed to the GitHub url. Thanks!

Edit: just read the “Latest” header. Dunno how I missed that

Topic		Replies	Views
[ANN] EvoTrees.jl v0.17 - API update (and breaking changes) Package Announcements	0	186	February 24, 2025
How to make EvoTrees.jl more performant? Performance cuda	25	1613	August 25, 2021
[ANN] EvoTrees.jl: experimental GPU support for gradient boosting trees Package Announcements machine-learning	4	1076	August 20, 2020
Boosted trees implementation feedback Machine Learning	0	427	June 26, 2019
[ANN] NeuroTreeModels.jl - Differentiable tree-based models for tabular data Package Announcements announcement , machine-learning	6	732	October 12, 2024

EvoTrees.jl v0.15.0

Direct handling of Tables compatible data

Handling of Categorical and Bool types

Improved handling of devices (CPU/GPU)

Fixed numerical instabilities

Related topics

Direct handling of `Tables` compatible data