[ANN] EvoTrees.jl v0.18 release

jeremiedb · October 27, 2025, 2:31am

This work behind the latest EvoTrees.jl release was mainly from SciML’s small grant project to improve GPU performance: Improve training performance of GPU backend Project Extension by AdityaPandeyCN · Pull Request #174 · SciML/sciml.ai · GitHub

Refactor of GPU training backend

Computations are now alsmost entirely done through KernelAbstractions.jl. Objective is to eventually have full support for AMD / ROCm in addition to current NVIDIA / CUDA devices.
Important performance increase, notably for larger max depth. Training time is now closely increase linearly with depth.

Breaking change: improved reproducibility

Training returns exactly the same fitted model for a given learner (ex: EvoTreeRegressor).
Reproducibility is respected for both cpu and gpu. However, thes result may differ between cpu and gpu. Ie: reproducibility is guaranteed only within the same device type.
The learner / model constructor (ex: EvoTreeRegressor) now has a seed::Int argument to set the random seed. Legacy rng kwarg will now be ignored.
The internal random generator is now Xishiro (was previously MersenneTwister with rng::Int).

Added node weight information in fitted trees

The train weight reaching each of the split/leaf nodes is now stored in the fitted trees. This is accessible via model.trees[i].w for the i-th tree in the fitted model. This is notably inteded to support SHAP value computations.

tecosaur · October 27, 2025, 4:08am

Very cool! It looks like EvoTrees now consistently beats XGBoost. Do you know how it compares to CatBoost for speed/OOTB accuracy? My impression is that these days CatBoost is the gold standard for boosted decision trees.

I’m hoping I’ll soon be able to actually follow through on my project that could benefit from EvoTrees. The improved reproducibility will be a help there, and TreeSHAP would be very nice to have

jling · October 27, 2025, 4:25am

I’ve seen conflicting evidence, https://arxiv.org/pdf/2408.14817v1 showes the biggest lead of CatBoost over XGB (Table 4)

But there are also benchmarks that look like they’re about the same:

jeremiedb · October 27, 2025, 4:31am

I’ve maintained some basic tabular benchmarks here: GitHub - Evovest/MLBenchmarks.jl: ML models benchmarks on public dataset
While I’m aware of good praise for CatBoost, I haven’t seen it outperform on my problems of interest. It can also depends to which extent the hyper-params were properly tuned. XGB, lightGBM and CatBoost can all be of interest; though they remain very similar algos.
Note that oblivious trees is supported, but I’ve only see it underperformaed compared to default binary mode.

To be seen for TreeShap timing. An external contributor has been looking at it. We may push to complete that feature in case he may not be able to complete.

tecosaur · October 27, 2025, 5:36am

Interesting. The thing I find most striking about https://arxiv.org/pdf/2506.16791 isn’t actually the CatBoost performance (which is reported favorably), but how close the untuned and tuned performance is:

This does line up with the impression I have of CatBoost doing a particularly job with defaults.

Interesting. It does seem rather problem dependent. I notice that in your benchmarks it’s broadly equivalent, with the exception of Boston where the MSE seems markedly improved for CatBoost.

Interesting, these seems similar to the question I just asked over on GitHub (about ordered boosting).

jeremiedb · October 29, 2025, 2:17pm

I would not put too much emphasis on untuned performance given that it can vastly depends on the some aribtrary default selection.
For instance, in XGBoost, default has a limited numner of iterations (100) along a high learning rate (0.3), whereas Catboost has a large number of trees 1000 along a lower learning rate.
For EvoTrees, I initially had a very trivial 10 iterations with a high learning-rate. It would not provide a great fit by default, but the intent then was not to have a good default but rather a minimal one. It was assumed that usage of such models would invovle hyper-param tuning.

In light of actual usage, and even how some papers perform algo comparison where defaults arguments are used as evidence for an algo performance, I may reconsider the defaults and opt for a some changes to defaults with larger nrounds, lower eta and some rowsample.

I think it much highlight how tricky benchmarking can be, as even performing an honest hyper-tuning can be non-trivial, some knowledge of the behavior of an algo’s hyper-params is useful in setting an efficient search.

ericphanson · October 29, 2025, 3:20pm

As a user, I do look at untuned perf when available, because if it’s good across a variety of datasets, or at least on a dataset close to my domain, then that’s reassuring that I’ll probably get good results without too much work. Since I’m not trying to evaluate scientifically which system is best, I’m just trying to find something to get the job done easily.

Topic		Replies	Views
EvoTrees.jl v0.15.0 Package Announcements machine-learning	3	508	June 26, 2023
[ANN] EvoTrees.jl: experimental GPU support for gradient boosting trees Package Announcements machine-learning	4	1118	August 20, 2020
How to make EvoTrees.jl more performant? Performance cuda	25	1711	August 25, 2021
Boosted trees implementation feedback Machine Learning	0	437	June 26, 2019
No variability in xgboost outputs? (XGBoost.jl) Statistics question	10	1115	August 25, 2021

[ANN] EvoTrees.jl v0.18 release

Refactor of GPU training backend

Breaking change: improved reproducibility

Added node weight information in fitted trees

Related topics