Gradient rise was obtained by optim.jl package optimization

You’re right, sometimes optimizations fall into local optimality, but I don’t think it’s a problem as long as it descends like a staircase. The Optim.jl optimization process has two different iterations, one is the iterative step controlled by parameter iterations, and the other is the search process of LBFGS , store_trace = true only store the value of the last LBFGS step.

What really confused me was the intermediate search step of LBFGS, you can see what I described in this problem. It clearly appears a value smaller than the last process in a certain LBFGS search process, but it still returns the larger value of the last step.