Parallel Random Forest



I am using Random Forest algorithm for classification using build_forest() and then apply_forest().
As these operations are running on only one process, how could i parallelize these operations?
And how can i generate graph for the same?


I’m not perfectly sure what you want to do, but I guess you want to parallelize training and prediction of random forest. The easiest way as far as I know is using Threads.@threads, which can run a loop body in parallel with multiple threads. An example code for training may looks like this:

function train_forest(X, Y, n_trees)
    trees = make_trees(n_trees)
    Threads.@threads for i in 1:n_trees
        train_tree!(trees[i], X, Y)
    return RandomForest(trees)


Yes, what i am doing is building a forest mode like

model = build_forest(yTrain, xTrain, 20, 50, 1.0)

where yTrain is labels and xTrain is features and then applying the model

predTest = apply_forest(model, xTest)

xTest is test matrix

as all these operations are running on single process, what i want is to parallelize this task
how could i do this?


I guess you are using the DecisionTree.jl package?

If so, it looks like the forest training is already parallelized through the @parallel macro, so you would only have to run addprocs() before training your model and then build_forest should use multiple workers (Check the link below for the package source code).


Please mention that this is cross listed


Yes, I am using DecisionTree package, I tried addprocs(4) in my code. But, after reading the test data set, i got error like this:

it is also saying error at build_forest() function call.
also showing error like:

ERROR (unhandled task failure): On worker 4:

similarly for On worker 3


after addprocs you need to load your packages on each worker, like this:

import DecisionTree
@everywhere using DecisionTree


Thank you @bjarthur :smile: it works !

But now i am getting less accuracy than previously on one process. is there any way to improve accuracy and efficiency of the algorithm?

Also, is there any way to store trained model so that i can load it and directly used it on test data set. As every time it is training the model.


That’s probably just a random fluctuation. You can improve accuracy by tweaking the hyperparameters (depth, number of trees, pruning threshold), but you have to be careful about overfitting. You can either setup cross-validation yourself and do a loop over different combinations of hyperparameter values, or use the ScikitLearn.jl interface, along with GridSearchCV to do model selection.

JLD.jl should work for saving pure-Julia structures to disk.


How can I do parrallel computing with modules imported using PyCall and @pyimport. I am trying to do something like this but it does not work. If I make n_jobs >1, and remove 3rd line of code from top (@everywhere (@pyimport lightgbm as lgb) ) it still uses single processor. I am using julia 1.0 on win 10 64 bit.

using PyCall : @pyimport
@pyimport lightgbm as lgb
@everywhere (@pyimport lightgbm as lgb) 
        model = lgb.LGBMClassifier(colsample_bytree=1.0,
                    learning_rate=0.1, max_depth=-1, min_child_samples=20,
                    min_child_weight=0.001, min_split_gain=0.0, n_estimators=250,
                    n_jobs=1, num_leaves=31, objective="binary", random_state=123,
                    reg_alpha=0.0, reg_lambda=0.0, subsample=1.0)
        fit!(model, X, y)

The error displayed is

On worker 6:
LoadError: UndefVarError: @pyimport not defined
top-level scope
eval at .\boot.jl:319
#116 at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\process_messages.jl:276
run_work_thunk at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\process_messages.jl:56
run_work_thunk at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\process_messages.jl:65
#102 at .\task.jl:259
in expression starting at In[37]:13
#remotecall_wait#154(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Distributed.Worker, ::Module, ::Vararg{Any,N} where N) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\remotecall.jl:407
remotecall_wait(::Function, ::Distributed.Worker, ::Module, ::Vararg{Any,N} where N) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\remotecall.jl:398
#remotecall_wait#157(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Int64, ::Module, ::Vararg{Any,N} where N) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\remotecall.jl:419
remotecall_wait(::Function, ::Int64, ::Module, ::Vararg{Any,N} where N) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\remotecall.jl:419
(::getfield(Distributed, Symbol("##163#165")){Module,Expr})() at .\task.jl:259

...and 3 more exception(s).

 [1] sync_end(::Array{Any,1}) at .\task.jl:226
 [2] remotecall_eval(::Module, ::Array{Int64,1}, ::Expr) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\macros.jl:207
 [3] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\macros.jl:190
 [4] top-level scope at In[37]:5


maybe it helps if you do
@everywhere using PyCall
such that the package is loaded on all procs


Thanks it does starts parrallel processing. However, I am not gaining any improvement in speed by doing this. I have data size of about ( 220K,28). Without parallel processing one single run takes about 10.6s. With parallel processing feature on, it takes about 16s.

I would like to know how can I gain improvement in speed?


Well, parallelization is not always trivial. Not all problems benefit from it (it depends on cache, data size, …)

However, I am not quite sure what exactly you are running in your code because you are using PyCall.
Effectively the model is fitted in Python, right?
If so, I am not sure if any ‘Distributed Code in Julia’ (or additional julia procs) will change anything at all, because python is doing the work here.


I have recently moved to Julia and trying to move my Python models to Julia. Since there is no lightgbm or XGBoost model in julia, I am trying to use the same thorugh PyCall in Julia.

I agree that calling python models in Julia may not be as efficient as native Julia models but I have no choice and hence my current exploration. In Python with the help of Cython, I was able to run 10 iterations of 3-fold CV in approx 95 seconds that too using single core.

Just another question. If I were to warp the whole model fit in a function, what changes would I need to make in the code (given in my first post).


Hm, I do not know if there is a native xgboost or gbm implementation in Julia. It does not seem to be the case. Maybe you can find something on
But I guess you already searched and opted for PyCall (an alternative would of course by RCall).

Have you tried this: ?
If your problem is binary, classification might work right?

I have an experimental Julia package which includes a regression boosting approach, but it is not well documented and may likely not fit your purpose ( )


Not sure what you mean by that
how about this:

function myfit(X,y)
model = lgb.LGBMClassifier(colsample_bytree=1.0,
                    learning_rate=0.1, max_depth=-1, min_child_samples=20,
                    min_child_weight=0.001, min_split_gain=0.0, n_estimators=250,
                    n_jobs=1, num_leaves=31, objective="binary", random_state=123,
                    reg_alpha=0.0, reg_lambda=0.0, subsample=1.0)
        fit!(model, X, y)
return model


Thanks for your quick revert.

Not sure what you mean by that

What I meant was that what changes would be required for using parrallel computing inside a function call to the model being fit. You have already shared a code for wrapping the model fit inside the function call. Where and what code do I put inside or outside the code to use parrallel computing in this case.



I have looked at it but not able to compile the model.


Thanks I will look into the models suggested by you.