L-BFGS returns a NaN gradient when providing analytical gradient. BFGS shows no issues


  • When providing an analytical gradient for optimization using LBFGS with boxed constraints, the optimizer fails occasionally due to gradient becoming NaN, i.e. g(x) = [Nan, ..., Nan].
  • The analytical gradient I provide never returns NaN.
  • When using BFGS no issue is found
  • Without an analytical gradient, i.e. letting the optimizer estimate the gradient numerically, no issue is found

Setup of optimizer

  • Objective function is strictly positive f > 0
  • Linesearch algorithm: BackTracking order 2 and 3

Code backbone

### Limits, seed and objective function
upper_limit = b*ones(number_parameters) #b >>1
lower_limit = d*ones(number_parameters) #b >> d >0
x0 = c*ones(number_parameters) #b >> c > d > 0

f = sum((elements_of_a_matrix_A)^2) #A is a function of x

### Gradient
function analytical_gradient(x,i)
	#some code using adjoint methods to calculate a gradient of a matrix A which is function of x
	return gradient_at_diagonal_element_i_of_A# ie. \nabla A[i,i]

function g!(G, x)
	for i=1:length(x)
        G[i] = analytical_gradient(x,i)
		isnan(G[i]) == true && println("Somehow the gradient became NaN")

### Callback
xs = []; gs = [];
cb = tr -> begin
            push!(xs, tr[end].metadata["x"])
            push!(gs, tr[end].metadata["g(x)"])

solution = Optim.optimize(f, g!, lower_limit, upper_limit, x0, Fminbox(LBFGS(linesearch=BackTracking())), Optim.Options(store_trace=true, extended_trace=true, callback=cb))

Through the callback I can observe that my gradient becomes NaN at some point of the optimization, while g! returns always a finite number.

Finally, I am running Julia 1.5.2 and

pkg> st --manifest Optim LineSearches
Status `~/.julia/environments/v1.5/Manifest.toml`
  [d3d80556] LineSearches v7.1.1
  [429524aa] Optim v1.2.0

I wonder if the issue arises during the approximation of the Hessian of the objective function. However, I have not found how to extract from the trace this Hessian to further debug.

Fortunately, for my problem BFGS is equally fine as I am working with ~ 50 parameters. But it is still puzzling why LBFGS fails.



A self-contained MWE would help a lot in investigating this.