Tuning the gamma parameter in FrankWolfe.jl

dridz · February 24, 2022, 2:27pm

Hi,
I found out about the FrankWolfe.jl package and it looks great ! I am dealing with an optimization problem for which I have a lot of intuition about a good warm start and also about the order of magnitude of a good step size. As far as I understand, the latter can be given to the algorithm via the gamma_max parameter in the code. The rational behind this is explained in this paper (equation 1).

This gamma_max does appear in the package in this file, however, I don’t understand how to enter this parameter when calling the FrankWolfe.frank_wolfe function. This function has a gamma_0 argument but it seems to be another parameter for the line search loop.

Can someone explain me how to set gamma_max, or if I am not supposed to tune it, at least how to take benefit of knowing the order of magnitude of step sizes to accelerate the algorithm ?

odow · February 25, 2022, 12:03am

cc @mbesancon

Otherwise you might be better off opening an issue: https://github.com/ZIB-IOL/FrankWolfe.jl/issues

mbesancon · February 28, 2022, 12:38pm

Hi @dridz, thanks for reaching out and trying out the package.

It indeed makes sense to reduce \gamma if you have a valid bound you know for your step size (I guess for example if you know your solution is in the interior of the feasible set).

We haven’t built up an interface for tuning the gamma max dynamically from within the algorithm. The best thing might be to define your custom step size rule which keeps a state with the current gamma_max.

For this, I’d advise using the breaking-02 branch which contains this PR which simplified and improved the interface

dridz · March 1, 2022, 10:54am

Thanks @mbesancon for your reply,

In fact I am dealing with a case where FW makes too tiny steps. My first thought was to improve gamma_max, but I deduce from what you say that the feasible space is somehow rescaled so that gamma lies in [0,1], which explains the default value gamma_max = 1 in the package.

Do you have any advice to help FW make larger steps ?

To give a bit of context, I was trying FrankWolfe.frank_wolfe with the default parameters except that I choose FrankWolfe.Backtracking line search because gradient evaluation is costly in my case. Here is the result that I get for 10 setps :

-------------------------------------------------------------------------------------------------
  Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec
-------------------------------------------------------------------------------------------------
     I             0  -1.493085e+03  -1.778473e+03   2.853878e+02   0.000000e+00            NaN
    FW             1  -1.494139e+03  -2.012238e+03   5.180988e+02   1.260088e+02   7.935951e-03
    FW             2  -1.496860e+03  -1.972483e+03   4.756228e+02   2.661040e+02   7.515857e-03
    FW             3  -1.498606e+03  -2.005371e+03   5.067651e+02   3.925943e+02   7.641475e-03
    FW             4  -1.499370e+03  -1.856503e+03   3.571328e+02   5.131031e+02   7.795704e-03
    FW             5  -1.502375e+03  -1.671001e+03   1.686262e+02   6.398012e+02   7.814928e-03
    FW             6  -1.502717e+03  -1.799314e+03   2.965965e+02   7.624646e+02   7.869218e-03
    FW             7  -1.507768e+03  -1.930737e+03   4.229692e+02   8.622760e+02   8.118050e-03
    FW             8  -1.507768e+03  -1.930737e+03   4.229692e+02   9.979712e+02   8.016263e-03
    FW             9  -1.507768e+03  -1.930737e+03   4.229692e+02   1.133724e+03   7.938439e-03
    FW            10  -1.507768e+03  -1.930737e+03   4.229692e+02   1.280840e+03   7.807375e-03
  Last            10  -1.507768e+03  -1.930737e+03   4.229692e+02   1.449241e+03   7.590180e-03
-------------------------------------------------------------------------------------------------

comparatively, I ran 10 steps of projected gradient with step size s_k = 5000/k and the objective decreases faster :

-1493.3797363103401
-1498.9514927344608
-1498.7787351591646
-1511.156509969127
-1515.8570471429125
-1518.1484484660875
-1522.6107165472215
-1523.9138023039504
-1523.9113305755095
-1524.2817735760082

in particular, is there a way to leverage that information (i.e. the knowledge of an order of magnitude for a good step size) in FW ?

mbesancon · March 1, 2022, 11:47am

I deduce from what you say that the feasible space is somehow rescaled so that gamma lies in [0,1], which explains the default value gamma_max = 1 in the package.

Yes exactly, gamma corresponds to the step from your current iterate to the computed extreme point of the feasible set. \gamma=1 brings the next iterate to the boundary. > 1 would make the next iterate infeasible.

What may happen is that the Frank-Wolfe direction is misaligned with your gradient, hence the small steps. I assume your feasible set is polyhedral? It might be worth trying the other main algorithms (AFW, etc), check out the docs here: Home · FrankWolfe.jl

dridz · March 1, 2022, 12:35pm

Yes, the set is polyhedral. I didn’t know that the direction could be misaligned with the gradient (I think that in a basic implementation they coincide up to sign right ?). I’ll look at other FW-like algorithms then and let you know.

mbesancon · March 1, 2022, 1:09pm

The direction v-x is obtained from the LMO, so it is not in general completely colinear with the negative of the gradient. The picture on the first page of the doc is helpful to visualize this. You can also look at the paper in CITATION.bib

dridz · March 8, 2022, 5:47pm

thanks @mbesancon, indeed looking around at other FW implementations helped to speed up the algorithm, and the role of gamma is now clear.

Topic		Replies	Views
FrankWolfe.jl: convex-constrained non-linear optimization at scale Package Announcements package , announcement , jump , optimization	2	790	April 17, 2021
How to properly specify maximum interations in Optimization? Optimization (Mathematical) question	4	485	January 23, 2024
Hacking LineSearches.jl to parallelize step estimation Optimization (Mathematical) optim , optimization	10	185	July 9, 2024
Changing the relstep of AutoFiniteDiff() in an OptimizationFunction Optimization (Mathematical)	5	56	January 21, 2025
How to set max_iterations for Optim.jl Optimization (Mathematical) optim	5	2590	October 1, 2021

Tuning the gamma parameter in FrankWolfe.jl

Related topics