How to use LBFGS to get the minimum value in the optimization process, but not the last one

F-YF · October 20, 2021, 3:35am

When I was doing optimization calculations with the Optim package[Optim.jl], , I found that the objective function sometimes unexpectedly tended to go up. I think it might have something to do with the LBFGS method, the output is as follows:

> ****** Start iteration ******
> F = -4.81788433555533
> Iter     Function value   Gradient norm 
>     12    -4.817884e+00     1.395425e-03
> * time: 10357.130357980728
> F = -4.817883652318229
> F = -4.8178842598090287
> F = -4.8178844431739205
> F = -4.817884441149004
>     13    -4.817884e+00     2.386978e-03
>  * time: 11284.003106832504
> F = -4.8178851080462163
> F = -3.5069671587620275
> F = -4.8178851185143867
>     14    -4.817885e+00     1.503471e-03
>  * time: 11956.861983776093
> F = -3.5049408088443412
> F = -4.8178852312846225
>     15    -4.817885e+00     2.457133e-03
>  * time: 12399.532339811325
> F = -4.8171348208231243
> F = -4.8178851007907943
> F = -4.817605598209048
> F = -4.8178853540998454
> F = -4.8178861103626924
> F = -4.8178860283335676
>     16    -4.817886e+00     5.654796e-03
>  * time: 13748.866719961166
> F = -4.8174211193119327
> F = -4.8178863688052015
> F = -4.817616539717973
> F = -4.8178866170924488
> F = -4.8177082906701635
> F = -4.8178867340185376
> F = -4.817885282079685
> F = -4.8178862655610124
> F = -4.817886223499005
> F = -4.817886200545940
> F = -4.817886231839386
>     17    -4.817886e+00     9.369874e-03
>  * time: 16195.583123922348
> F = -4.817734141839478
> F = -4.817886721853432
> F = -4.817886804908878
> F = -4.817886741268174
>     18    -4.817887e+00     2.352987e-02
>  * time: 17070.98549580574
> F = -4.8179764454908386
> F = -4.81766569337557
> F = -4.8176652922814415
> F = -4.817681067165163
> F = -4.8176817791081063
> F = -4.81769547362258
> F = -4.8176966061470634
> F = -4.817706332897073
> F = -4.8177074199523324
> F = -4.817713206770256
> F = -4.817713968728021
> F = -4.8177169972285214
> F = -4.817717435737219
> F = -4.817718902975148
> F = -4.8177191422553198
> F = -4.8177198309903297
> F = -4.8177199691171357
> F = -4.817720307283978
> F = -4.817720394838664
> F = -4.8177205648666814
> F = -4.817720624840671
> F = -4.8177207145334364
> F = -4.8177207599171345
> F = -4.8177208179113404
> F = -4.8177208544833187
> F = -4.817720897133743
> F = -4.8177209292867852
> F = -4.817720961728725
> F = -4.817720990923807
> F = -4.817721021332298
> F = -4.8177210490378183
> F = -4.8177210748543863
> F = -4.8177210994101344
> F = -4.817721123687319
> F = -4.8177211442480927
> F = -4.8177211658054166
> F = -4.8177211854674992
> F = -4.8177212054286906
> F = -4.8177212258705806
> F = -4.817721244745889
> F = -4.8177212633004238
> F = -4.817721280736686
> F = -4.8177212976987484
> F = -4.8177213166930084
> F = -4.8177213338690254
> F = -4.8177213496580706
> F = -4.8177213662410314
> F = -4.817721383443776
> F = -4.8177213983089304
> F = -4.817721413337718
> F = -4.817721427798411
> F = -4.8177214419574534
> F = -4.8177214556244742
> F = -4.8177214695759302
> F = -4.8177214823252945
> F = -4.8177214956964906
> F = -4.8177215082179154
> F = -4.817721520920773
> F = -4.8177215329464266
> F = -4.817721546360258
> F = -4.817721557102018
> F = -4.8177215706022873
> F = -4.817721580049021
> F = -4.817721589683824
> F = -4.817721598684452
> F = -4.817721607136909
> F = -4.8177216147542667
> F = -4.817721622146373
> F = -4.8177216295100273
> F = -4.81772163643716
> F = -4.8177216431906897
> F = -4.817721649831073
> F = -4.817721655756141
> F = -4.817721660933518
> F = -4.8177216660798882
> F = -4.8177216713892985
> F = -4.817721675519504
> F = -4.81772167973229
> F = -4.817721683120893
> F = -4.817721686850274
> F = -4.817721689946943
> F = -4.8177216919708367
> F = -4.8177216950074614
> F = -4.8177216964975944
> F = -4.8177216989234885
> F = -4.817721699849128
> F = -4.817721698626862
> F = -4.81772169795953
>     19    -4.817722e+00     1.278123e-01
>  * time: 29953.599843978882
> F = -4.817752508464134
> F = -4.8177310945489346
> F = -4.8177691966325846
>     20    -4.817769e+00     3.265473e-02
>  * time: 30326.188237905502
> F = -4.817886177056045
> F = -4.817670698882429
> F = -4.81788095974910
>     21    -4.817881e+00     4.256618e-03
>  * time: 30704.19914484024
> F = -4.8178743434998305
> F = -4.8178857617004804
>     22    -4.817886e+00     6.601777e-04
>  * time: 30953.2505300045
> F = -4.817885797395228
> F = -4.81788515290503
> F = -4.817885595933557
>     23    -4.817886e+00     2.752117e-03
>  * time: 31329.490315914154
> F = -4.8178858131902853
> F = -4.8178868219502857
> F = -4.8178883488247685
> F = -4.81788770546057
>     24    -4.817888e+00     8.906169e-03
>  * time: 31836.535990953445

It can be found that in the LBFGS iteration process, the minimum value may not be the value generated by the last iteration, but the final output is the value of the last iteration, especially from the steps 18 to 19.So how do I get the minimum value at the end of the LBFGS iteration, instead of the value obtained in the last step.

bdc · October 20, 2021, 2:33pm

As shown here, you can use the store_trace option to store the intermediate result. Afterwards you can search within the trace for the smallest objective function value.

F-YF · October 21, 2021, 3:00am

I don’t think this method obviously work very well. If I can’t make my optimization function decrease monotonically with optim.jl, so what’s the point of this optimization process. In addition, store_trace = true only store the value of the last LBFGS step, not the intermediate value in the LBFGS step.

kellertuer · October 21, 2021, 8:08am

What is all the output in between? Is every function evaluation between the iterates an Armijo check for the step size? Can you maybe provide a little more code and output of the step size?

For example: What is your memory storage size in LBFGS? Keep in mind that (1) Newton only converges locally (b) with LBFGS you have a quasi newton method, I.e. you only approximate the Hessian (more memory here yields a better approximation); but even more I am surprised about the function evaluations (1) and (2) after iteration 18, so it would be cool to get more details thereon (step size, parameters for Armijo, are you using maybe also Wolfe…)?

edit: I am not using Optim myself, but I have implemented LBFGS for another package.

F-YF · October 22, 2021, 3:04am

I don’t know how to view more details，and * I think I just used the default parameter：

LBFGS(; m = 10,
        alphaguess = LineSearches.InitialStatic(),
        linesearch = LineSearches.HagerZhang(),
        P = nothing,
        precondprep = (P, x) -> nothing,
        manifold = Flat(),
        scaleinvH0::Bool = true && (typeof(P) <: Nothing))

I changed the parameter linesearch = LineSearches.HagerZhang() to StrongWolfe and BackTracking. BackTracking have similar results with HagerZhang, but I get an error when I use StrongWolfe. I am sorry that I don’t know how to get more details.

kellertuer · October 22, 2021, 6:54am

Hm, ok. I am not familiar enough with Optim to directly say what you could check, bt I am surprised that StrongWolfe yields an error.

Another interesting point is of course whether your function is convex. If it is not convex and has multiple minima, then in your step 18 you might have just jumped into another valley, This can be avoided by Armijo step size rule, I pe it is avoided by HagerZhang as well, but sometimes also rounding errors hit.

Topic		Replies	Views
Gradient rise was obtained by optim.jl package optimization General Usage package , optimization	10	1082	November 5, 2021
Controls the number of LBFGS iterations with optim.jl Performance package , optim , optimization	5	1495	October 28, 2021
Optim L-BFGS() - Can I save internal state? Optimization (Mathematical) question , optim	1	1749	January 25, 2018
Using Optim.jl but failing to find minimum Optimization (Mathematical)	6	921	June 19, 2022
`Optimization.LBFGS` fails to converge while `Optim.NelderMead()` works General Usage question , optim , optimization	11	280	March 29, 2025

How to use LBFGS to get the minimum value in the optimization process, but not the last one

Related topics