Gradient rise was obtained by optim.jl package optimization

When I was doing optimization calculations with the Optim package[Optim.jl], , I found that the objective function sometimes unexpectedly tended to go up. I don’t know why this happens, hope someone can answer for me, thank you very much. The details are as follows:

The code is called:

res = optimize(Optim.only_fg!(fg_Function!), 
                            InitialF, LBFGS(), inplace = false,
                            Optim.Options(x_abstol = 0.0, x_reltol = 0.0, f_abstol = 0.0, f_reltol = 0.0,
                            g_abstol = 0.0, g_reltol = 0.0, show_trace = true, iterations = OptIter, store_trace = true)) 

Fg_Function! contains the objective function and its derivatives, which have been tested without problem. The output is as follows:

****** Start iteration ******
F= -4.5092727954675236 
Iter     Function value   Gradient norm 
     0    -4.509273e+00     3.392928e-05
 * time: 5.602836608886719e-5
F= -4.509272777421474
F= -4.5092727929444236
     1    -4.509273e+00     2.752020e-05
 * time: 2270.425837993622
F= -4.509272792529507 
F= -4.5092727938377273 
F= -4.50927279939761 
F= -4.5092727621328796 
F= -4.509272799025066 
     2    -4.509273e+00     7.634763e-05
 * time: 8109.941900014877
F= -4.509272810926075 
F= -4.5092726943995975 
F= -4.509272808015269 
     3    -4.509273e+00     4.701028e-05
 * time: 11601.862957000732
F= -4.5092728164806513 
F= -4.5092728412304868 
F= -4.5092726582391403 
F= -4.5092728448386152 
     4    -4.509273e+00     1.918509e-04
 * time: 16263.132133960724
F= -4.5092728623030904 
F= -4.5092727831721384 
F= -4.50927286394497 
     5    -4.509273e+00     6.076972e-05
 * time: 19799.04483485222
F= -4.509272865883659 
F= -4.509272869318157 
F= -4.509272869016567 
     6    -4.509273e+00     6.941216e-05
 * time: 23349.618561029434
F= -4.50927287416470 
F= -4.5092728917610874 
F= -4.5092728164927656 
F= -4.5092728980259014 
     7    -4.509273e+00     3.006336e-05
 * time: 28027.132161855698
F= -4.5092729002958176 
F= -4.50927288461689 
F= -4.50927289877002 
     8    -4.509273e+00     2.579953e-05
 * time: 31555.314399957657
F= -4.509272899666974 
F= -4.50927290675508 
F= -4.5092729403745038
F= -4.509272894742775
F= -4.5092729591768324 
     9    -4.509273e+00     4.023887e-05
 * time: 37346.814374923706
F= -4.5092729849628834
F= -4.509272952205056
F= -4.509272993048358 
    10    -4.509273e+00     3.574861e-05
 * time: 40652.16582298279
F= -4.5092730050285605
F= -4.509273043435568
F= -4.509273053946075
F= -4.509273080837682 
    11    -4.509273e+00     4.467094e-05
 * time: 44999.02312397957
F= -4.5092732196861642
F= -4.5092735683321523
F= -4.509273523863599 
    12    -4.509274e+00     9.423961e-05
 * time: 48244.51411008835
F= -4.509273791374507
F= -4.5092748461061034
F= -4.5092609228550677
F= -4.5092734668365786
F= -4.509273443446354
F= -4.509273322647026
F= -4.5092732752980393
F= -4.509273251845994
F= -4.5092732247850065
F= -4.509273189111617
F= -4.509273157333895
F= -4.509273130782579
F= -4.509273108386304
F= -4.509273089086743
F= -4.5092730721028644
F= -4.509273057023425
F= -4.5092730433268424
F= -4.509273031069279
F= -4.5092730200095716
F= -4.509273009942818
F= -4.509273000506798
F= -4.509272991747495
F= -4.509272983531698
F= -4.509272975819828
F= -4.5092729686011324
F= -4.5092729616603844
F= -4.5092729552624724
F= -4.50927294920335
F= -4.5092729433565895
F= -4.5092729378717173
F= -4.509272932668717
F= -4.509272927672122
F= -4.5092729228816064
F= -4.5092729182949816
F= -4.509272913904435
F= -4.5092729096995674
F= -4.5092729056604166
F= -4.5092729017945956
F= -4.5092728980402716
F= -4.509272894404665
F= -4.50927289093888
F= -4.5092728875175236
F= -4.5092728842502288
F= -4.5092728810681892
F= -4.509272878017274
F= -4.5092728751101006
F= -4.509272872315126
ERROR: AssertionError: B > A
 [1] (::LineSearches.HagerZhang{Float64,Base.RefValue{Bool}})(::Function, ::LineSearches.var"#ϕdϕ#6"{Optim.ManifoldObjective{OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}},Array{Float64,1},Array{Float64,1},Array{Float64,1}}, ::Float64, ::Float64, ::Float64) at /public3/home/sc55305/.julia/packages/LineSearches/Ki4c5/src/hagerzhang.jl:276
 [2] HagerZhang at /public3/home/sc55305/.julia/packages/LineSearches/Ki4c5/src/hagerzhang.jl:101 [inlined]
 [3] perform_linesearch!(::Optim.LBFGSState{Array{Float64,1},Array{Array{Float64,1},1},Array{Array{Float64,1},1},Float64,Array{Float64,1}}, ::LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#18#20"}, ::Optim.ManifoldObjective{OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}}) at /public3/home/sc55305/.julia/packages/Optim/uwNqi/src/utilities/perform_linesearch.jl:59
 [4] update_state!(::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Optim.LBFGSState{Array{Float64,1},Array{Array{Float64,1},1},Array{Array{Float64,1},1},Float64,Array{Float64,1}}, ::LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#18#20"}) at /public3/home/sc55305/.julia/packages/Optim/uwNqi/src/multivariate/solvers/first_order/l_bfgs.jl:204
 [5] optimize(::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#18#20"}, ::Optim.Options{Float64,Nothing}, ::Optim.LBFGSState{Array{Float64,1},Array{Array{Float64,1},1},Array{Array{Float64,1},1},Float64,Array{Float64,1}}) at /public3/home/sc55305/.julia/packages/Optim/uwNqi/src/multivariate/optimize/optimize.jl:57
 [6] optimize(::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#18#20"}, ::Optim.Options{Float64,Nothing}) at /public3/home/sc55305/.julia/packages/Optim/uwNqi/src/multivariate/optimize/optimize.jl:35
 [7] #optimize#87 at /public3/home/sc55305/.julia/packages/Optim/uwNqi/src/multivariate/optimize/interface.jl:142 [inlined]
 [8] main() at ./none:45
 [9] top-level scope at ./timing.jl:174

Where the iteration parameter OptIter=20, and F is the target function.

You can see that the LBFGS iteration actually produces an increase in the objective function.

My preliminary judgment is that the step size of LBFGS needs to be adjusted, but I don’t know how to adjust the parameters so that the gradient is in the right direction

LBFGS(; m = 10,
        alphaguess = LineSearches.InitialStatic(),
        linesearch = LineSearches.HagerZhang(),
        P = nothing,
        precondprep = (P, x) -> nothing,
        manifold = Flat(),
        scaleinvH0::Bool = true && (typeof(P) <: Nothing))

Are there any suggestions about how best to tackle the problem?

From here

With Optim.jl optimizers, you can set allow_f_increases=true in order to let increases in the loss function not cause an automatic halt of the optimization process. Using a method like BFGS or NewtonTrustRegion is not guaranteed to have monotonic convergence and so this can stop early exits which can result in local minima.
So I think this is a feature of (L)BFGS methods. I think you need to make use of the allow_f_increases = false option.

Thank you for your suggestion. I’ll think about it

I don’t think your idea could solve my problem. allow_f_increases : Allow steps that increase the objective value. Defaults to false . Note that, when setting this to true , the last iterate will be returned as the minimizer even if the objective increased.

So the new problem is that LBFGs can not return the minimum value, only the value of the last iteration, but this problem also occurs when I set the parameter to false. It looks like I need to submit a new question to make that distinction.

So what you are saying is your objective is increasing although allow_f_increases is false? That sounds fishy to me and I’d probably file an issue against the package.

Yes, perhaps (L)BFGS is ill-suited to your problem? In my experience, the allow_f_increases = true option just allows the method to escape some number of local minima before finding a steady optimum. In concert with your store_trace = true option, you are always able to find the best solution from those explored after the optimization. If a different optimization algorithm is viable for your problem, then most of those in Optim.jl should be drop-in replacements. Alternatively, de rigueur appears to be chaining optimization algorithms (global into local, or fast into slow); I have had success with Nelder-Mead into BFGS for some mid-size problems.

You are right, the objective function will go up although allow_f_increases is false

I don’t think this method obviously work very well. If I can’t make my optimization function decrease monotonically with optim.jl, so what’s the point of this optimization process.

You’re right, sometimes optimizations fall into local optimality, but I don’t think it’s a problem as long as it descends like a staircase. The Optim.jl optimization process has two different iterations, one is the iterative step controlled by parameter iterations, and the other is the search process of LBFGS , store_trace = true only store the value of the last LBFGS step.

What really confused me was the intermediate search step of LBFGS, you can see what I described in this problem. It clearly appears a value smaller than the last process in a certain LBFGS search process, but it still returns the larger value of the last step.

I think it’s also possible that my function definition process has some unknown error, and I’ll check my code .

The point of non-monotonic methods is, among others, to:

  • have a more flexible way to accept trial steps ;
  • allow full Newton steps (even though you temporarily degrade the objective and the constraint violation), which amounts to fast convergence.

It is a bit counterintuitive, but several techniques were devised this way (e.g.