Optimize performance comparison - Optim.jl vs Scipy


I am just starting to learn about optimization. I picked up Optim.jl (great documentation, btw) and tried to do the same thing in Python.

It seems that Rosenbrock function is what everyone uses as an example. Perhaps not too surprisingly, Julia is a lot faster than Python (appox. 60x) but then I am curious where the performance difference come from. Has anyone done similar exercise before? What do you think?

I’ve uploaded my notebooks for reference.

  1. Julia (63 μs) http://nbviewer.jupyter.org/github/tk3369/julia-notebooks/blob/master/Rosenbrock_Julia.ipynb

  2. Python (3.85 ms, with numba jit 3.55 ms) http://nbviewer.jupyter.org/github/tk3369/julia-notebooks/blob/master/Rosenbrock_Python.ipynb

Performance of Optim.jl vs. Matlab's fminunc

The runtime of rosenbrock is totally negligble in Julia:

julia> @btime rosenbrock($x2) # result of running optimization
  1.354 ns (0 allocations: 0 bytes)

but it may take close to a microseond or per call in Python?

Also, FWIW, I once tried to lean up Optim for use on cheap functions in simulations. Things aren’t in a pretty state at the moment, but, setup (note this “once differentiable” object uses ForwardDiff):

using BenchmarkTools, DifferentiableObjects # https://github.com/chriselrod/DifferentiableObjects.jl
using TriangularMatrices 
# This should really be StaticArrays, but in a rush job I didn't make things as generic as I should have.
# I'll get around to making it appropriately generic within a couple months
# https://github.com/chriselrod/TriangularMatrices.jl
rosenbrock(x) =  (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x = RecursiveVector{Float64,2}() .= 0; # basically, an MVector
od = OnceDifferentiable(rosenbrock, deepcopy(x));


julia> optimize_light(od, x, BFGS())
([1.0, 1.0], 7.645688055381766e-21)

julia> x
2-element RecursiveVector{Float64,2}:

julia> @benchmark optimize_light($od, $x, BFGS())
  memory estimate:  14.08 KiB
  allocs estimate:  313
  minimum time:     10.747 μs (0.00% GC)
  median time:      11.165 μs (0.00% GC)
  mean time:        18.872 μs (37.76% GC)
  maximum time:     51.231 ms (99.95% GC)
  samples:          10000
  evals/sample:     1

Also, you should definitely give BackTracking a try:

julia> using LineSearches

julia> @benchmark optimize_light($od, $x, BFGS(;linesearch = BackTracking()))
  memory estimate:  3.02 KiB
  allocs estimate:  83
  minimum time:     5.105 μs (0.00% GC)
  median time:      5.244 μs (0.00% GC)
  mean time:        6.884 μs (20.84% GC)
  maximum time:     9.548 ms (99.93% GC)
  samples:          10000
  evals/sample:     6

I’ve found it to be much faster than the default line search HagerZhang in my problems (but slightly less reliable when the parameter space is poorly behaved).

Anyway, for most problems, the actual function calls dominate runtime. This I think is overall a major advantage of Julia – you’re optimizing Julia functions, not Python functions.
And it also means the performance benefits of leaning out Optim, like you see above, are negligible for most problems.

Final aside: would be nice to take advantage of https://github.com/yuyichao/FunctionWrappers.jl
to get type stability, without having to recompile for different objectives.


This is similar to the ODE example. DifferentialEquations.jl (called from Python) beats Numba-compiled SciPy ODE calls by about 10x even though it’s just using Fortran+Numba.

The reason is because function calls in Python are expensive. This is true whether or not a function call is JIT-compiled since you still need a Python interpreter in the middle. This dynamicness in the middle also inhibits all sorts of optimizations. So Numba is really just a trick for small scripts and microbenchmarks but it’s not something that will work well for packages. Its worse case scenario are higher order functions (functions which take in functions), and these show up all of the time in differential equation solvers, optimization problems, etc. (anything that handles user-defined nonlinearity).

I’ll be putting a blog post out which goes into this in some detail.


Do you know why this is so much faster? I just tried it on an "Optim vs. fminunc" horse race that I am working on, and it sped things up tremendously.


Follow up question, I’ve also made some benchmark here of Optim vs. scipy, and python algorithms are much slower. The issue is that I’m calling python from Julia and that my functions are defined in Julia, so presumably python has to callback into Julia. Is there any overhead involved in that or python is just that slow.