Comparison between Julia and Python Random Forest Regression

Hi,
I am doing a comparison between Julia and Python by converting a python ML code into Julia and the documenting the time taken by both the codes for different sections. I am using Random Forest algorithm. In Julia, I am using DecisionTree.jl package to do the same. In the documentation it says Python-based ScikitLearn is used and yet my results are showing that for training the Random Forest model, Julia is 150% faster than Python.

Can someone guide me as to why this is so since, DecisionTree also uses python-based ScikitLearn to train the model or is it because that it’s written completely in Julia explaining the faster training.

Any help would be appreciated.
Thanks

1 Like

I suspect there’s a bit of confusion with the readme saying that DecisionTree.jl supports scikitlearn’s API; that does not mean it calls scikit learn; it means it can be called from ScikitLearn.jl which is a wrapper for the python library scikit-learn as well as for a number of other models (apologies if I misunderstood your question)

And yes DecisionTree.jl is entirely written in Julia. As for the performances, can you clarify what you’re comparing with? are you comparing with scikit-learn’s random forest? If so then IIRC scikit-learn’s ensemble models are a bit slow and so seeing a 1.5x performance improvement would not be too surprising.

5 Likes

Is there an easier way to know which of the models in scikit-learn framework are implemented completely in Julia and which of the models are being referenced from the python’s scikit-learn library.

Thanks.

I think the easiest way is just to look at the code in the package?

2 Likes

If you look at the scikitlearn.jl docs it lists reasonably clearly which models come from where. At the time of writing there’s GaussianMixtures, GaussianProcesses, DecisionTree and LowRankModels that are Julia libraries, and unless I’m mistaken, all are fully written in Julia. (GaussianProcesses may look different on github because it has a ton of notebooks but the code is all in Julia)

Everything else is ported from the original scikit-learn and called via pycall.

2 Likes

Thanks a lot. That was really helpful.

1 Like

Hello, I found this post interesting.

I am comparing build_forest from DecisionTree.jl (JULIA) and RandomForestClassifier from sklearn.ensemble (PYTHON), and I am getting faster results of classification for Python.

I am using BenchmarkTools.belapsed, similar hyper-parameters to obtain similar models between languages and a dataset with 19 features and maximum of 353000 observations.

Well, I know this post is 3 years old, maybe a lot changed or maybe I am doing something wrong.