Anyone has an implementation of generic boosting?

Dear All,

I would like to ask, as the title suggest, if anyone has a general implementation of boosting algorithm? I did some search and found that it is usually tightly coupled with a base learners being decision trees. But Boosting is a general meta-algorithm which assumes that the underlying base learner can fit possibly weighted samples and perform prediction on them.

I have recently started to wonder, why general people believes that NNs sucks on tabular datasets and Boosted decision trees shines. I came to conclusion that boosting might be the culprit, since single tree sucks as well. Since I would like to know, if I am right or wrong, I would like to test (and also would like to be right).

Due to my lack of time (you can also read it laziness), I would like to ideally hook already existing implementation (my six years old implementation is in matlab).

Well, thanks for answers and opinions on the matter of learning tabular data.

Tomas

There’s two unmaintained libs that might provide you with a decent starting point:

I don’t have experience with either so ymmv

PS: I think your assertion that “NNs suck on tabular datasets” is not up to date. Packages like e.g. AutoGluonTabular seem to suggest otherwise (though it blends NNs with other things).

2 Likes

Thanks for links and correction of my knowlege. I was hoping that someone will point me to updated state of the art.

So I read the AutoGluonTabular, and it is not a model based purely on Neural Networks, but they use whatever model scikit learn offers, and an ensembling strategy seems to be a very important part of the solution.

Thanks a lot @tlienart for pointing me to this direction (more pointers are welcomed).

I have fixed the GradientBoost, such that tests (almost) pass on 1.6. The only trouble is clashing of fit! and predict which I do not know, where they are defined.
The fixed library is here
https://github.com/pevnak/GradientBoost.jl

I will try to contact the owner.

I cloned your repo at d7fe4df to see if I could help with fit and predict but when trying to run the tests, most do not pass (with errors like Util or GBBaseLearner or ML not defined). Are you working on a separate branch?

Summary

fwiw I’m on 1.7 but I doubt that changes much here.

(GradientBoost) pkg> status
     Project GradientBoost v0.1.0
      Status `~/Desktop/tjd/GradientBoost.jl/Project.toml`
  [864edb3b] DataStructures v0.18.12
  [38e38edf] GLM v1.7.0
  [7f8f8fb0] LearnBase v0.4.1
  [30fc2ffe] LossFunctions v0.7.2
  [9920b226] MLDataPattern v0.5.5
  [872c559c] NNlib v0.8.5
  [429524aa] Optim v1.7.0
  [f2b01f46] Roots v2.0.1
  [a759f4b9] TimerOutputs v0.5.19
  [9a3f8284] Random
  [10745b16] Statistics