How to use LBFGS to get the minimum value in the optimization process, but not the last one

When I was doing optimization calculations with the Optim package[Optim.jl], , I found that the objective function sometimes unexpectedly tended to go up. I think it might have something to do with the LBFGS method, the output is as follows:

> ****** Start iteration ******
> F = -4.81788433555533
> Iter     Function value   Gradient norm 
>     12    -4.817884e+00     1.395425e-03
> * time: 10357.130357980728
> F = -4.817883652318229
> F = -4.8178842598090287
> F = -4.8178844431739205
> F = -4.817884441149004
>     13    -4.817884e+00     2.386978e-03
>  * time: 11284.003106832504
> F = -4.8178851080462163
> F = -3.5069671587620275
> F = -4.8178851185143867
>     14    -4.817885e+00     1.503471e-03
>  * time: 11956.861983776093
> F = -3.5049408088443412
> F = -4.8178852312846225
>     15    -4.817885e+00     2.457133e-03
>  * time: 12399.532339811325
> F = -4.8171348208231243
> F = -4.8178851007907943
> F = -4.817605598209048
> F = -4.8178853540998454
> F = -4.8178861103626924
> F = -4.8178860283335676
>     16    -4.817886e+00     5.654796e-03
>  * time: 13748.866719961166
> F = -4.8174211193119327
> F = -4.8178863688052015
> F = -4.817616539717973
> F = -4.8178866170924488
> F = -4.8177082906701635
> F = -4.8178867340185376
> F = -4.817885282079685
> F = -4.8178862655610124
> F = -4.817886223499005
> F = -4.817886200545940
> F = -4.817886231839386
>     17    -4.817886e+00     9.369874e-03
>  * time: 16195.583123922348
> F = -4.817734141839478
> F = -4.817886721853432
> F = -4.817886804908878
> F = -4.817886741268174
>     18    -4.817887e+00     2.352987e-02
>  * time: 17070.98549580574
> F = -4.8179764454908386
> F = -4.81766569337557
> F = -4.8176652922814415
> F = -4.817681067165163
> F = -4.8176817791081063
> F = -4.81769547362258
> F = -4.8176966061470634
> F = -4.817706332897073
> F = -4.8177074199523324
> F = -4.817713206770256
> F = -4.817713968728021
> F = -4.8177169972285214
> F = -4.817717435737219
> F = -4.817718902975148
> F = -4.8177191422553198
> F = -4.8177198309903297
> F = -4.8177199691171357
> F = -4.817720307283978
> F = -4.817720394838664
> F = -4.8177205648666814
> F = -4.817720624840671
> F = -4.8177207145334364
> F = -4.8177207599171345
> F = -4.8177208179113404
> F = -4.8177208544833187
> F = -4.817720897133743
> F = -4.8177209292867852
> F = -4.817720961728725
> F = -4.817720990923807
> F = -4.817721021332298
> F = -4.8177210490378183
> F = -4.8177210748543863
> F = -4.8177210994101344
> F = -4.817721123687319
> F = -4.8177211442480927
> F = -4.8177211658054166
> F = -4.8177211854674992
> F = -4.8177212054286906
> F = -4.8177212258705806
> F = -4.817721244745889
> F = -4.8177212633004238
> F = -4.817721280736686
> F = -4.8177212976987484
> F = -4.8177213166930084
> F = -4.8177213338690254
> F = -4.8177213496580706
> F = -4.8177213662410314
> F = -4.817721383443776
> F = -4.8177213983089304
> F = -4.817721413337718
> F = -4.817721427798411
> F = -4.8177214419574534
> F = -4.8177214556244742
> F = -4.8177214695759302
> F = -4.8177214823252945
> F = -4.8177214956964906
> F = -4.8177215082179154
> F = -4.817721520920773
> F = -4.8177215329464266
> F = -4.817721546360258
> F = -4.817721557102018
> F = -4.8177215706022873
> F = -4.817721580049021
> F = -4.817721589683824
> F = -4.817721598684452
> F = -4.817721607136909
> F = -4.8177216147542667
> F = -4.817721622146373
> F = -4.8177216295100273
> F = -4.81772163643716
> F = -4.8177216431906897
> F = -4.817721649831073
> F = -4.817721655756141
> F = -4.817721660933518
> F = -4.8177216660798882
> F = -4.8177216713892985
> F = -4.817721675519504
> F = -4.81772167973229
> F = -4.817721683120893
> F = -4.817721686850274
> F = -4.817721689946943
> F = -4.8177216919708367
> F = -4.8177216950074614
> F = -4.8177216964975944
> F = -4.8177216989234885
> F = -4.817721699849128
> F = -4.817721698626862
> F = -4.81772169795953
>     19    -4.817722e+00     1.278123e-01
>  * time: 29953.599843978882
> F = -4.817752508464134
> F = -4.8177310945489346
> F = -4.8177691966325846
>     20    -4.817769e+00     3.265473e-02
>  * time: 30326.188237905502
> F = -4.817886177056045
> F = -4.817670698882429
> F = -4.81788095974910
>     21    -4.817881e+00     4.256618e-03
>  * time: 30704.19914484024
> F = -4.8178743434998305
> F = -4.8178857617004804
>     22    -4.817886e+00     6.601777e-04
>  * time: 30953.2505300045
> F = -4.817885797395228
> F = -4.81788515290503
> F = -4.817885595933557
>     23    -4.817886e+00     2.752117e-03
>  * time: 31329.490315914154
> F = -4.8178858131902853
> F = -4.8178868219502857
> F = -4.8178883488247685
> F = -4.81788770546057
>     24    -4.817888e+00     8.906169e-03
>  * time: 31836.535990953445

It can be found that in the LBFGS iteration process, the minimum value may not be the value generated by the last iteration, but the final output is the value of the last iteration, especially from the steps 18 to 19.So how do I get the minimum value at the end of the LBFGS iteration, instead of the value obtained in the last step.

As shown here, you can use the store_trace option to store the intermediate result. Afterwards you can search within the trace for the smallest objective function value.

I don’t think this method obviously work very well. If I can’t make my optimization function decrease monotonically with optim.jl, so what’s the point of this optimization process. In addition, store_trace = true only store the value of the last LBFGS step, not the intermediate value in the LBFGS step.

What is all the output in between? Is every function evaluation between the iterates an Armijo check for the step size? Can you maybe provide a little more code and output of the step size?

For example: What is your memory storage size in LBFGS? Keep in mind that (1) Newton only converges locally (b) with LBFGS you have a quasi newton method, I.e. you only approximate the Hessian (more memory here yields a better approximation); but even more I am surprised about the function evaluations (1) and (2) after iteration 18, so it would be cool to get more details thereon (step size, parameters for Armijo, are you using maybe also Wolfe…)?

edit: I am not using Optim myself, but I have implemented LBFGS for another package.

I don’t know how to view more details,and * I think I just used the default parameter:

LBFGS(; m = 10,
        alphaguess = LineSearches.InitialStatic(),
        linesearch = LineSearches.HagerZhang(),
        P = nothing,
        precondprep = (P, x) -> nothing,
        manifold = Flat(),
        scaleinvH0::Bool = true && (typeof(P) <: Nothing))

I changed the parameter linesearch = LineSearches.HagerZhang() to StrongWolfe and BackTracking. BackTracking have similar results with HagerZhang, but I get an error when I use StrongWolfe. I am sorry that I don’t know how to get more details.

Hm, ok. I am not familiar enough with Optim to directly say what you could check, bt I am surprised that StrongWolfe yields an error.

Another interesting point is of course whether your function is convex. If it is not convex and has multiple minima, then in your step 18 you might have just jumped into another valley, This can be avoided by Armijo step size rule, I pe it is avoided by HagerZhang as well, but sometimes also rounding errors hit.