Optim.jl vs scipy.optimize once again

pkofod · May 26, 2021, 10:23am

It’s long been on my list of “problem with simple bounds” methods to implement, so I will take you up on this comment and ask for a review soon NLSolvers.jl already has something from that family (DF-SANE) of algorithms depending on the Grippo-Lampariello-Lucidi-style line search and spectral gradient “updates” so it’ll be a very welcome addition.

rht · May 26, 2021, 12:18pm

NLopt is certainly an option. I will try this and report. My option was very limited when I started with the Python implementation of my model. But today I discovered that NLopt also has a Python API.

I just upgraded my Julia from 1.5.4 to 1.6.1, after https://github.com/NixOS/nixpkgs/pull/123188 was merged. To my surprise, I observed a ~2x speedup in general:

Vanilla LBFGS using finite difference: 6.744 s → 3.188 s
Single function evaluation: 27.658 μs → 13.088 μs
LBFGS using finite difference, with same params as scipy.optimize except that it uses HagerZhang line search: 700.513 ms → 367.378 ms
LBFGS using autodiff = :forward, with same params as scipy.optimize except that it uses HagerZhang line search: 485.330 ms → 309.725 ms
@PharmCat’s Newton method with sigmoid: 144.142 ms (on 1.6.1)
SPGBox: 868.441 μs (on 1.6.1)
scipy.optimize using PyCall: 14.093 ms → 10.520 ms (on 1.6.1)

As you noticed, I also benchmarked SPGBox (thank you @lmiq for building this package), and it only did 4 function evaluations, and reached out the desired result. The code is at https://github.com/rht/climate_stress_test_benchmark/blob/main/pharmcat_v2_g.jl.

 SPGBOX RESULT: 

 Convergence achieved. 

 Final objective function value = -6.263329329419884
 Best solution found = [ 0.0, 1.0, 1.0, ..., 1.0]
 Projected gradient norm = 0.0

 Number of iterations = 3
 Number of function evaluations = 4

I need to understand why SPGBox is effective and efficient, but I think I need to generate more test cases to check that SPGBox is robust across parameter variations.

I also want to understand why 1.6.1 is 2x faster than 1.5.4. I thought the main improvement focus is on the precompilation time?

Additionally, I also made a C++ version, and had observed that its single function evaluation (75 μs) is slower than the Julia version (13.088 μs). I’m not sure why. The C++ version can be found at https://github.com/rht/climate_stress_test_benchmark/blob/main/climate_stress_test_simplified.cpp.

lmiq · May 26, 2021, 12:24pm

I would not expect any miraculous behaviour there. Except that I can guarantee that calls to spgbox are short in memory resources and can be completely allocation free if you preallocate the auxiliary vectors. Otherwise the method is good, but there is no fundamental reason for it be that better than all other alternatives.

@rht and others: since SPGBox raised some interest, I updated it now to follow the conventions concerning the solver call (functions come first) and upper and lower bounds are set by lower and upper arrays.

This will be in some minutes released as version 0.2.0 of the package, which will be a breaking change in the API. Thus, the tests must be updated. The example I posted above is already updated to follow the new interface.

pkofod · May 26, 2021, 1:57pm

I think this is indeed because you seem to have many variables at the boundary (let me actually ask, are they ever interior?).

rht · May 26, 2021, 2:04pm

For the simplified model I posted on my GitHub repo, the values are most of the time either 0 or 1, but this is a 1-firm model. When the number of firms are 3 or higher, the transition from 0 and 1 becomes more gradual. Additionally, in a more intricate version, it is not always the case that the variable starts from 0 and ends with 1. We intend to do at least 50 firms, but can’t, as it took us 7 hours (parallelized on a 64 cores AWS instance) just to run 4 firms with a DeltaT of 30 years, using Python’s scipy.optimize.

lmiq · May 26, 2021, 3:03pm

One of the spg original developers just told me: “You could have told him that in principle the behavior of SPG and L-BFGS-B, should be very similar, something we have demonstrated in this paper. Of course for particular problems that has to be tested.”

Just reinforcing that we should not expect such a huge difference between the methods, in general.

PharmCat · May 27, 2021, 9:57pm

I think the efficacy of the optimization algorithm of function when its arguments effect is like binary predictor or smooth can be very different. As you can see when you use a sigmoid link - it takes less iteration for line search (maybe why the function is more predictable in this case), so I want to say that maybe it will have more sense to make a comparison on data when behavior is near robust condition.

rht · June 9, 2021, 12:24pm

I will prepare a version of the model that doesn’t have binary optimum condition soon.

Update: I have migrated my model’s entire code to Julia using PyCall for scipy.optimize(method='L-BFGS-B'). Though I encountered problem in using Threads.@threads because Julia calling Python is not thread-safe.

I have also discovered GitHub - Gnimuc/LBFGSB.jl: Julia wrapper for L-BFGS-B Nonlinear Optimization Code, a Julia wrapper around the LBFGSB Fortran code. So finally we can do apple-to-apple comparisons with scipy.optimize and SPGBox. In my new benchmark, I have also provided a grad function that is a finite difference provided by Optim.jl, i.e.

promote_objtype(x, f) = Optim.OnceDifferentiable(f, x, real(zero(eltype(x))))
d = promote_objtype(AA.xs0, fn)
function fd_g!(z, x)
    z .= Optim.gradient!(d, x)
end

EDIT: this result is FLAWED. SPGBox mutated x0 into its optimized version, and so the subsequent @btime of various methods finished prematurely because x0 is already optimized. The updated result is at Optim.jl vs scipy.optimize once again - #52 by rht.
New result summary:

CG with Fminbox is the slowest.
SPGBox and LBFGSB.jl are almost the same, for both ForwardDiff and finite difference gradients. This is consistent with the paper linked in Optim.jl vs scipy.optimize once again - #46 by lmiq
Finite difference is faster than ForwardDiff – how??
SPGBox and LBFGSB.jl time with finite difference are almost the same as a single function evaluation – how?? I have made sure to run them in isolation, and they still return the intended result.
scipy.optimize(method='L-BFGS-B') (in Julia using PyCall) is 122x slower than LBFGSB.jl – I think I need a separate benchmark using a Rosenbrock function just to be sure.

Log (note: the -6.2633… is just the minimum, just to show that they are consistent):
Benchmarking single function execution
13.615 μs (1 allocation: 160 bytes)
Benchmarking PharmCat v2 (CG) using grad specified by ForwardDiff
407.140 ms (18888 allocations: 3.41 MiB)
-6.263329327214707
Benchmarking SPGBox using grad specified by ForwardDiff
880.418 μs (44 allocations: 9.30 KiB)
-6.263329329419884
Benchmarking PharmCat v2 (CG) using finite difference grad
398.354 ms (34072 allocations: 4.52 MiB)
-6.263329318909087
Benchmarking SPGBox using finite difference grad
15.633 μs (14 allocations: 2.22 KiB)
-6.263329329419884
Benchmarking LBFGSB.jl using grad specified by ForwardDiff
884.683 μs (35 allocations: 7.80 KiB)
-6.263329329419884
Benchmarking LBFGSB.jl using finite difference grad
18.015 μs (5 allocations: 784 bytes)
-6.263329329419884
Benchmarking scipy.optimize(method=‘L-BFGS-B’) which uses finite difference grad
2.203 ms (1757 allocations: 84.33 KiB)
-6.263329329419884

Code to reproduce: climate_stress_test_benchmark/pharmcat_v2_g.jl at 0c4b266b111218f826bd75fdd1938ab13eeb9806 · rht/climate_stress_test_benchmark · GitHub.

rht · June 9, 2021, 12:38pm

I forgot to mention that I have made sure that the parameters I specified for LBFGSB.jl match the default parameters used in scipy.optimize(method='L-BFGS-B').

lmiq · June 9, 2021, 12:40pm

Be careful in these benchmarks in not providing the solution as the initial point. At least in the case of SPGBox, I know that it mutates the initial point given to the solution. Thus, to benchmark it properly, you need to reset it, with something like:

x0 = something
@btime spgbox!($fn, $g!, $lower, $upper, x) setup=(x=copy(x0)) evals=1

edit: now (v0.3.1) there is also a non-mutating version of the solver, which can be run with:

@btime spgbox($fn, $g!, $lower, $upper, $x0)

(without the ! - but for that it (deep)copies x on input, thus for a non-allocating alternative use the first one with auxiliary array preallocation.

rht · June 9, 2021, 12:47pm

This is the benchmark result for Rosenbrock function, based on the README.md in GitHub - Gnimuc/LBFGSB.jl: Julia wrapper for L-BFGS-B Nonlinear Optimization Code.

I have made sure to run LBFGSB.jl first, and to ensure that x0 is a const.
The result is that LBFGSB.jl is 67x faster.

Benchmarking single function execution
  34.225 ns (0 allocations: 0 bytes)
Benchmarking LBFGSB.jl
  444.786 μs (133 allocations: 8.28 KiB)
6.689900791974536e-10
Benchmarking scipy.optimize(method='L-BFGS-B')
  29.816 ms (30116 allocations: 1.33 MiB)
6.677651771226814e-10

Code to reproduce: climate_stress_test_benchmark/rosenbrock.jl at 6920133a8f387bc3378c43b6f7d880f28552435a · rht/climate_stress_test_benchmark · GitHub.

rht · June 9, 2021, 1:04pm

Indeed. SPGBox mutated the AA.xs0! I have checked that LBFGSB.jl doesn’t mutate the x0.

ForwardDiff and finite difference have almost the same time, consistent across optimization methods
SPGBox is 40% faster than LBFGSB.jl, even with copy(xs0) inside the @btime loop (I use copy(xs0) so that the code is simpler)
LBFGSB.jl is about 2x faster than scipy.optimize(method=‘L-BFGS-B’)

Log:

Benchmarking single function execution
  12.977 μs (1 allocation: 160 bytes)
Benchmarking PharmCat v2 (CG) using grad specified by ForwardDiff
  383.874 ms (18888 allocations: 3.41 MiB)
-6.263329327214707
Benchmarking SPGBox using grad specified by ForwardDiff
  3.363 ms (138 allocations: 31.19 KiB)
-6.263329329419884
Benchmarking PharmCat v2 (CG) using finite difference grad
  388.767 ms (34072 allocations: 4.52 MiB)
-6.26332930693809
Benchmarking SPGBox using finite difference grad
  3.330 ms (270 allocations: 40.83 KiB)
-6.263329329419884
Benchmarking LBFGSB.jl using grad specified by ForwardDiff
  5.198 ms (203 allocations: 44.95 KiB)
-6.263329329419884
Benchmarking LBFGSB.jl using finite difference grad
  5.105 ms (401 allocations: 59.48 KiB)
-6.263329329419884
Benchmarking scipy.optimize(method='L-BFGS-B') which uses finite difference grad
  10.605 ms (9197 allocations: 447.61 KiB)
-6.263329329419884

Fixed in [BUGFIX] Prevent SPGBox from mutating xs0 · rht/climate_stress_test_benchmark@caea4dd · GitHub.

Sukera · June 9, 2021, 2:06pm

const does not mean what you think it means. It only means that the type can’t change and it’s a promise that your usercode doesn’t change the contents of mutable containers (it’s a “true” const for immutable structs, though that again doesn’t extend to mutable containers within that struct). You will still need the setup part.

help?> const                                                                                   
search: const isconst MathConstants contains codeunits ncodeunits countlines count_ones

 [...]

  Note that "constant-ness" does not extend into mutable containers; only the association      
  between a variable and its value is constant. If x is an array or dictionary (for example)   
  you can still modify, add, or remove elements.

rht · June 10, 2021, 12:01am

I see. I have additionally checked the x0 after a single optimization for both LBFGSB.jl and scipy.optimize, and in both x0 remains unchanged.

gbaraldi · June 10, 2021, 12:23am

I was reading the thread and realized how small the academic world is. My first programming teacher in university was him. His class was so good.

Topic		Replies	Views
Optimize performance comparison - Optim.jl vs Scipy Performance	15	5754	February 12, 2020
Making LBFGS in `Optim.jl` as fast as in `NLopt.jl` Optimization (Mathematical)	13	2069	December 6, 2022
SymPy vs Sympy.jl and/or symengine.jl Modelling & Simulations symbolic	14	5156	September 2, 2020
Performance of Optim.jl vs. Matlab's fminunc Optimization (Mathematical) performance	12	3994	August 6, 2018
Comparison of derivative-free / black-box optimization algorithms in Julia Optimization (Mathematical) question , comparison	12	2670	February 14, 2024

Optim.jl vs scipy.optimize once again

Related topics