New machine learning package, Julia implementation of XGBoost

I am implementing a new one. I will push up the code once it’s more ready

3 Likes

Have you thought integrating it with DecisionTree.jl to bundle all tree-like ML algorithms?

3 Likes

+1 for integrating with DecisionTree.jl. For me decision trees are a pretty common tool and it would be really nice to have one well-developed package for it (and things related) in Julia.

2 Likes

I am not sure if my implementation is compatible with DecisionTree package

It probably isn’t but there has been some discussion to revamp the internal structure anyways so it’s probably fine if it has benefits and the current API doesn’t need to be changed.

1 Like

JuML is now updated to Julia v1.1.0.

I am now in Sydney and have some time to play with JuML on big data (e.g. Fannie Mae). Anyone interested in an informal hackathon? If you want to learn more about functional style Julia with lazy sequences, folds and applications to big data out of core processing then this is your opportunity:)

Adam

9 Likes

Hope JuML.jl picks up again. But, I got JLBoost.jl going which is an XGBoost implementation in pure Julia that plays nice with DataFrames.jl and CategoricalArrays.jl

6 Likes

Probably worth mentioning EvoTrees.jl as well.

4 Likes

I was curious if I could reimplement JuML in Python with Numba. It turns out it is easy and the performance is comparable:

The code bases are practically equivalent and both are running ca 3x faster than C++ single threaded XGBoost.

You can run the Python version in a Kaggle notebook:

Adam

4 Likes

Numpy has 50%+ C code, I won’t call pure python but it’s pretty cool.

How does cortado compare to JuML?

I meant package source code is free from any C or C++:slight_smile:

Both packages use the same data “slicing” technology which minimizes memory usage. So in terms of memory allocations/deallocations etc it is very similar.

The interesting part was trying to cherry pick functions for Numba to swallow and see if performance is similar to Julia which does it kind of out of the box.

I observe very consistent run times in cortado. JuML has a big hit at startup (ca 16 sec on my laptop), then it is bit inconsistent on my laptop: faster or slower than cortado.

C++ XGBoost seems much slower which shows that it is not simply language vs language.

1 Like