I have a (somewhat expensive to calculate) loss function f(x)
for which I can compute exact gradients using Zygote.jl
and wish to minimise it.
So far I have been using the LBFGS implementation in NLopt.jl
to do that, but there appear to be some problems with the loss functions causing NLopt to abort optimisation in some cases and return the return code :FORCED_STOP
(this happens in roughly one third of cases). Because NLopt.jl
calls some C code inside, it is very hard to debug to what exactly is going wrong, so I wanted to switch to a pure julia implementation of the LBFGS method in Optim.jl
. However, I found that this was about 3 times slower than the NLopt implementation:
opt = NLopt.Opt(:LD_LBFGS, nparams)
# loss_fun.for_nlopt(x, grad) writes the gradient at x to grad and returns f(x)
objective = loss_fun.for_nlopt
opt.min_objective = objective
opt.xtol_rel = 1e-8
opt.ftol_rel = 1e-8
opt.maxeval = 5000
@time ret_nlopt = NLopt.optimize(opt, x0)
@show opt.numevals
@show ret_nlopt[1];
@show ret_nlopt[3]
#= output
1.534183 seconds (2.96 M allocations: 975.899 MiB, 4.01% gc time)
opt.numevals = 493
ret_nlopt[1] = -16.68266898954586
ret_nlopt[3] = :FTOL_REACHED
=#
whereas in Optim.jl:
options = Optim.Options(x_reltol=1e-8, f_reltol=1e-8, iterations=5000)
method = Optim.LBFGS(linesearch=LineSearches.BackTracking())
# loss_fun.g!(grad, x) writes the gradient at x into grad
# loss_fun.fg!(grad, x) writes the gradient at x into grad and returns f(x)
once_diff = Optim.OnceDifferentiable(loss_fun, loss_fun.g!, loss_fun.fg!, x0)
@time ret_optim = Optim.optimize(od, x0, method, options)
@show ret_optim
#= output
3.373001 seconds (7.98 M allocations: 2.412 GiB, 3.59% gc time)
ret_optim = * Status: success
* Candidate solution
Final objective value: -1.677771e+01
* Found with
Algorithm: L-BFGS
* Convergence measures
|x - x'| = 3.12e-04 ≰ 0.0e+00
|x - x'|/|x'| = 7.54e-05 ≰ 1.0e-08
|f(x) - f(x')| = 1.55e-07 ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = 9.21e-09 ≤ 1.0e-08
|g(x)| = 3.24e-04 ≰ 1.0e-08
* Work counters
Seconds run: 3 (vs limit Inf)
Iterations: 967
f(x) calls: 998
∇f(x) calls: 968
=#
Is there a way to use exactly the same settings as in NLopt for Optim.jl for LBFGS?
I have already looked online for a solution and found
but none of the tips mentioned there helped.