Small followup:
I tried LBFSGB directly. If I record the “fidelity” for all evaluations of the loss function, I get this plot:
That’s remarkably similar to the first plot in my previous post obtained for Optimization.jl with NLopt.LD_LBFGS()
However, this includes evaluations of the loss function during the line seach! If I pull out the actual values of the loss function from LBFSGB’s internals (the “iterate information” with iprint=100
), I get this plot for the fidelity (1-loss
):
which is nice and monotonic and exactly what I was expecting to see.
So this leads me to conclude that the callback
function in Optimization.jl
is called inside of linesearch iterations, not just after each iteration of the optimizer. That totally explains the non-monotonic results, but it’s not generally what I would expect for a callback
function (or at least, it would be nice to have the option to decide whether I want to get a callback on all evaluations of the loss function, or just the “iterate values”). I’ve often used the callback
to check monotonic convergence, because if I’m not seeing monotonic convergence (of the iterate values), that’s usually an indicator that something is wrong in my numerics. That’s not going to work if Optimizaton.jl
uses the callback
inside the linesearch.
(This could be NLOpt’s fault, too, not necessarily Optimization.jl
)
So, the tentative conclusion seems to be that the gradients from Zygote are probably okay, but there’s some rough edges in the frameworks.
Anyway, thanks! This has been instructive!
P.S.: opened an issue at The `callback` appears to be called for linesearch iterations · Issue #724 · SciML/Optimization.jl · GitHub