[ANN] EvoTrees.jl: experimental GPU support for gradient boosting trees

Starting with v0.5.0, it’s now possible to build GBT models on GPU, thanks to the underlying CUDA.jl package providing nimble tools to handle kernels. Speedups are modest compared to those observed with XGBoost’s gpu_hist approach as there likely are remaining optimizations along the CPU-GPU traffic or more efficient kernels.

A model can be trained on gpu using:
fit_evotree_gpu(params1, X, Y) instead of the usual fit_evotree(params1, X, Y).
For predictions: predict_gpu(model, X_train)

6 Likes

do I have to convert X to CuArrays’ first?

No, the input should still be regular Matrix. GPU operations are performed for the gradient update and histograms accumulation all happening in the background.

1 Like

Would it better to run GPU only if user has converted to CuArrays.jl then u can benchmark just the computation and not the data transfer?

I think it would effectively makes sense to run straight on GPU if the input is a CuArray. However, as of now, as there’s not such equivalent to Rapids/CuDf in Julia as far as I know, I wouldn’t necessarily expect input data to be naturally brought in as a CuArray. Also, current conversation to CuArray within the algorithm routine is aligned with the xgboost/lightgbm approach where input is also converted to GPU within the training routine so the benchmark between EvoTrees and others is I think fair with the current approach. Anyhow, for what I’ve observed, this step doesn’t add much to the global training time.
I think main bottleneck in current implementation is the back and forth between CPU and GPU as the histogram is built on GPU but the best split scan is done on CPU.

1 Like