I’m using Optim.jl
to solve an unconstrained minimization problem.
In this particular problem I have a black-box function, which can take a long time on a single function evaluation. I don’t have access to gradient information, and even though I have tried to use automatic differentiation, there are some parts of the code that the differentiator cannot handle and throws some errors.
Nevertheless, I’m using the L-BFGS implementation from Optim.jl
with the finite differences approximation for the gradient information, and it seems to work fine. Unfortunately, I cannot show the full code here because it is not my property, but I can at least show a snippet of the code that does the optimization call
closure(x) = sqerror_model(x; phi=phi, rho=rho, model=model) # The objective function, a black-box
initial_x = randn(dim)
optim_res = Optim.optimize(
closure, initial_x, Optim.LBFGS(; m=15), Optim.Options(iterations=200, show_trace=true)
)
With this snippet I’m trying to point out that there are 200 iterations and that I want to see the trace for each iteration. I also changed to m
parameter for the optimization method to get a little bit more precision.
What I get in return is the following
Here, the dimension=3001
is the size of my parameter vector. This is the expected length of my minimizer.
I have some questions from this.
- What does the gradient norm equal to zero mean?
- I specified 200 iterations on the optimization call, why do I only see one iteration, namely the zeroth iteration?
- Why do I see zero on all the convergence measures? And also, what does the NaN mean in this convergence measures?
- At the end, it says that only one call to both the function and the gradient were done. Does this mean that only one iteration was done?