That’s interesting. Using a Python library from Julia should never be much slower, so I wander if you did something wrong, and how to profile.
Note, what I wrote assumed using it directly, e.g. with PyCall.jl or PythonCall.jl which is an option for you. If you use the ScikitLearn.jl wrapper (or any (thin) wrapper), it shouldn’t add overhead.
I also had simple/single-threaded in mind. I don’t know if this changes things:
I looked at all the code files and noticed: ScikitLearn.jl/grid_search.jl at e70bf7208306110d91f1cfe183cb27ccf88e9215 · cstjean/ScikitLearn.jl · GitHub
Is it about something simple as running as:
julia --procs auto
It’s quite slow to start that way (at least in Julia 1.8-rc1, with my 16 cores), but after startup could give up to 16x (for me) speedup, if actually exploited. Maybe you’re measuring the fixed startup overhead (that Python doesn’t have?), that seems way too excessive (11 sec for me, rather than usual 0.2 sec startup), and should (and I believe could) be fixed in some Julia version.
I’m not sure Distributed is exploited in the package (i.e. should I also see e.g. @everywhere
there?). Was it the plan, and the wrapper incomplete?
What’s your timing, both with Julia and with pure Python code? Can you monitor and see if Python spawns many processes (and Julia does not)?