Dear All,
I would like to ask, as the title suggest, if anyone has a general implementation of boosting algorithm? I did some search and found that it is usually tightly coupled with a base learners being decision trees. But Boosting is a general meta-algorithm which assumes that the underlying base learner can fit possibly weighted samples and perform prediction on them.
I have recently started to wonder, why general people believes that NNs sucks on tabular datasets and Boosted decision trees shines. I came to conclusion that boosting might be the culprit, since single tree sucks as well. Since I would like to know, if I am right or wrong, I would like to test (and also would like to be right).
Due to my lack of time (you can also read it laziness), I would like to ideally hook already existing implementation (my six years old implementation is in matlab).
Well, thanks for answers and opinions on the matter of learning tabular data.
Tomas
1 Like
There’s two unmaintained libs that might provide you with a decent starting point:
I don’t have experience with either so ymmv
PS: I think your assertion that “NNs suck on tabular datasets” is not up to date. Packages like e.g. AutoGluonTabular seem to suggest otherwise (though it blends NNs with other things).
2 Likes
Thanks for links and correction of my knowlege. I was hoping that someone will point me to updated state of the art.
So I read the AutoGluonTabular, and it is not a model based purely on Neural Networks, but they use whatever model scikit learn offers, and an ensembling strategy seems to be a very important part of the solution.
Thanks a lot @tlienart for pointing me to this direction (more pointers are welcomed).
I have fixed the GradientBoost, such that tests (almost) pass on 1.6. The only trouble is clashing of fit!
and predict
which I do not know, where they are defined.
The fixed library is here
https://github.com/pevnak/GradientBoost.jl
I will try to contact the owner.
I cloned your repo at d7fe4df to see if I could help with fit and predict but when trying to run the tests, most do not pass (with errors like Util
or GBBaseLearner
or ML
not defined). Are you working on a separate branch?
Summary
fwiw I’m on 1.7 but I doubt that changes much here.
(GradientBoost) pkg> status
Project GradientBoost v0.1.0
Status `~/Desktop/tjd/GradientBoost.jl/Project.toml`
[864edb3b] DataStructures v0.18.12
[38e38edf] GLM v1.7.0
[7f8f8fb0] LearnBase v0.4.1
[30fc2ffe] LossFunctions v0.7.2
[9920b226] MLDataPattern v0.5.5
[872c559c] NNlib v0.8.5
[429524aa] Optim v1.7.0
[f2b01f46] Roots v2.0.1
[a759f4b9] TimerOutputs v0.5.19
[9a3f8284] Random
[10745b16] Statistics
Hi guys, @tlienart, @Tomas_Pevny.
How far did you go with resurrection of the GBDT?
The DecisionTree.jl
is great, but I’m so much surprised no finding general implementation of the boosted trees in Julia
I sort of hacked the original implementation for my purposes, but always got tired. But to be honest, I gut it out, because I do not care about using GBDT, I wanted to Boost neural networks.