EvoTrees.jl v0.15.0

EvoTrees.jl just went through a significant refurbishment for v0.15.0:

https://evovest.github.io/EvoTrees.jl/stable/

Direct handling of Tables compatible data

It’s not possible to train directly from Tables like structure, most notably DataFrame and named tuples, which are natural ways in which tabular data presents itself:

using EvoTrees, DataFrames
config = EvoTreeRegressor()
dtrain = DataFrame(x_train, :auto)
dtrain.y .= y_train
m = fit_evotree(config, dtrain; target_name="y")
pred = m(dtrain)

Support for the original Matrix/Vector based data remains:

x_train, y_train = rand(1_000, 10), rand(1_000)
m = fit_evotree(config; x_train, y_train)
pred = m(x_train)

Handling of Categorical and Bool types

When using a Table compatible data input, features with element types Real (incl. Bool) and Categorical are automatically recognized as input features. Alternatively, fnames kwarg can be used to explicitly specify feature vars.

m = fit_evotree(config, dtrain; target_name="y", fnames=["x1", "x3"]);

Categorical features are treated accordingly by the algorithm. Ordered variables will be treated as numerical features, using split rule, while unordered variables are using ==. Support is currently limited to a maximum of 255 levels. Bool variables are treated as unordered, 2-levels cat variables.

Improved handling of devices (CPU/GPU)

GPU memory footprint has been significantly reduced thanks to a single histogram kept on GPU ram instead of 3 for every node of a tree.

Training on “cpu” or “gpu” is now controlled over the kwarg passed to fit_evotree (no longer part of the model contructor such as EvoTreeRegressor.

All GPU specific structs have been removed, common CPU based structs are used for both CPU and GPU based training (GPU specific objects are kep in cache).

Fixed numerical instabilities

EvoTree model contructors used to support the T kwarg to specify either Float32 or Float64 as the basis for computation, ex: EvoTreeRegressor(T=Float64). This has been dropped in v0.15 and instead calculations at the observation level are handled as Float32 while accumulations are done with Float64. This provides best of both world: it solves some numerical instabilities observed with Float32 on some larger datasets, while keeping performance similar to full Float32 precision.

7 Likes

This package is amazing! The docs are fabulous.

Out of curiosity – is there a particular reason it’s not been registered?

Thanks for the kind words!

EvoTrees.jl is indeed registered: https://github.com/JuliaRegistries/General/tree/master/E/EvoTrees.
Have you encountered some issue installing from General registry?

1 Like

Oh, great! I didn’t try, just saw that the docs pointed to the GitHub url. Thanks!

Edit: just read the “Latest” header. Dunno how I missed that :person_facepalming: