RandomForestRegressor in Julia

Palli · July 20, 2022, 3:14pm

That’s interesting. Using a Python library from Julia should never be much slower, so I wander if you did something wrong, and how to profile.

Note, what I wrote assumed using it directly, e.g. with PyCall.jl or PythonCall.jl which is an option for you. If you use the ScikitLearn.jl wrapper (or any (thin) wrapper), it shouldn’t add overhead.

I also had simple/single-threaded in mind. I don’t know if this changes things:

I looked at all the code files and noticed: ScikitLearn.jl/grid_search.jl at e70bf7208306110d91f1cfe183cb27ccf88e9215 · cstjean/ScikitLearn.jl · GitHub

Is it about something simple as running as:

julia --procs auto

It’s quite slow to start that way (at least in Julia 1.8-rc1, with my 16 cores), but after startup could give up to 16x (for me) speedup, if actually exploited. Maybe you’re measuring the fixed startup overhead (that Python doesn’t have?), that seems way too excessive (11 sec for me, rather than usual 0.2 sec startup), and should (and I believe could) be fixed in some Julia version.

I’m not sure Distributed is exploited in the package (i.e. should I also see e.g. @everywhere there?). Was it the plan, and the wrapper incomplete?

What’s your timing, both with Julia and with pure Python code? Can you monitor and see if Python spawns many processes (and Julia does not)?

Topic		Replies	Views
Issues with fit! and DecisionTreeRegressor New to Julia	3	754	February 11, 2018
Comparison between Julia and Python Random Forest Regression Machine Learning	6	3811	December 20, 2022
Parallel Random Forest General Usage question	23	5632	May 2, 2019
My Random Forest is very slow Performance	10	4844	August 28, 2020
GLM is slow on large datasets. Using OnlineStats for regressions? MixedModels? Performance glm	25	5189	November 26, 2018

RandomForestRegressor in Julia

Related topics