NLopt is certainly an option. I will try this and report. My option was very limited when I started with the Python implementation of my model. But today I discovered that NLopt also has a Python API.
I just upgraded my Julia from 1.5.4 to 1.6.1, after https://github.com/NixOS/nixpkgs/pull/123188 was merged. To my surprise, I observed a ~2x speedup in general:
- Vanilla LBFGS using finite difference: 6.744 s → 3.188 s
- Single function evaluation: 27.658 μs → 13.088 μs
- LBFGS using finite difference, with same params as scipy.optimize except that it uses HagerZhang line search: 700.513 ms → 367.378 ms
- LBFGS using autodiff = :forward, with same params as scipy.optimize except that it uses HagerZhang line search: 485.330 ms → 309.725 ms
- @PharmCat’s Newton method with sigmoid: 144.142 ms (on 1.6.1)
- SPGBox: 868.441 μs (on 1.6.1)
- scipy.optimize using PyCall: 14.093 ms → 10.520 ms (on 1.6.1)
As you noticed, I also benchmarked SPGBox (thank you @lmiq for building this package), and it only did 4 function evaluations, and reached out the desired result. The code is at https://github.com/rht/climate_stress_test_benchmark/blob/main/pharmcat_v2_g.jl.
 SPGBOX RESULT: 
 Convergence achieved. 
 Final objective function value = -6.263329329419884
 Best solution found = [ 0.0, 1.0, 1.0, ..., 1.0]
 Projected gradient norm = 0.0
 Number of iterations = 3
 Number of function evaluations = 4
I need to understand why SPGBox is effective and efficient, but I think I need to generate more test cases to check that SPGBox is robust across parameter variations.
I also want to understand why 1.6.1 is 2x faster than 1.5.4. I thought the main improvement focus is on the precompilation time?
Additionally, I also made a C++ version, and had observed that its single function evaluation (75 μs) is slower than the Julia version (13.088 μs). I’m not sure why. The C++ version can be found at https://github.com/rht/climate_stress_test_benchmark/blob/main/climate_stress_test_simplified.cpp.