Tuning the gamma parameter in FrankWolfe.jl

Hi,
I found out about the FrankWolfe.jl package and it looks great ! I am dealing with an optimization problem for which I have a lot of intuition about a good warm start and also about the order of magnitude of a good step size. As far as I understand, the latter can be given to the algorithm via the gamma_max parameter in the code. The rational behind this is explained in this paper (equation 1).

This gamma_max does appear in the package in this file, however, I don’t understand how to enter this parameter when calling the FrankWolfe.frank_wolfe function. This function has a gamma_0 argument but it seems to be another parameter for the line search loop.

Can someone explain me how to set gamma_max, or if I am not supposed to tune it, at least how to take benefit of knowing the order of magnitude of step sizes to accelerate the algorithm ?

cc @mbesancon

Otherwise you might be better off opening an issue: https://github.com/ZIB-IOL/FrankWolfe.jl/issues

Hi @dridz, thanks for reaching out and trying out the package.

It indeed makes sense to reduce \gamma if you have a valid bound you know for your step size (I guess for example if you know your solution is in the interior of the feasible set).

We haven’t built up an interface for tuning the gamma max dynamically from within the algorithm. The best thing might be to define your custom step size rule which keeps a state with the current gamma_max.

For this, I’d advise using the breaking-02 branch which contains this PR which simplified and improved the interface

Thanks @mbesancon for your reply,

In fact I am dealing with a case where FW makes too tiny steps. My first thought was to improve gamma_max, but I deduce from what you say that the feasible space is somehow rescaled so that gamma lies in [0,1], which explains the default value gamma_max = 1 in the package.

Do you have any advice to help FW make larger steps ?

To give a bit of context, I was trying FrankWolfe.frank_wolfe with the default parameters except that I choose FrankWolfe.Backtracking line search because gradient evaluation is costly in my case. Here is the result that I get for 10 setps :

-------------------------------------------------------------------------------------------------
  Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec
-------------------------------------------------------------------------------------------------
     I             0  -1.493085e+03  -1.778473e+03   2.853878e+02   0.000000e+00            NaN
    FW             1  -1.494139e+03  -2.012238e+03   5.180988e+02   1.260088e+02   7.935951e-03
    FW             2  -1.496860e+03  -1.972483e+03   4.756228e+02   2.661040e+02   7.515857e-03
    FW             3  -1.498606e+03  -2.005371e+03   5.067651e+02   3.925943e+02   7.641475e-03
    FW             4  -1.499370e+03  -1.856503e+03   3.571328e+02   5.131031e+02   7.795704e-03
    FW             5  -1.502375e+03  -1.671001e+03   1.686262e+02   6.398012e+02   7.814928e-03
    FW             6  -1.502717e+03  -1.799314e+03   2.965965e+02   7.624646e+02   7.869218e-03
    FW             7  -1.507768e+03  -1.930737e+03   4.229692e+02   8.622760e+02   8.118050e-03
    FW             8  -1.507768e+03  -1.930737e+03   4.229692e+02   9.979712e+02   8.016263e-03
    FW             9  -1.507768e+03  -1.930737e+03   4.229692e+02   1.133724e+03   7.938439e-03
    FW            10  -1.507768e+03  -1.930737e+03   4.229692e+02   1.280840e+03   7.807375e-03
  Last            10  -1.507768e+03  -1.930737e+03   4.229692e+02   1.449241e+03   7.590180e-03
-------------------------------------------------------------------------------------------------

comparatively, I ran 10 steps of projected gradient with step size s_k = 5000/k and the objective decreases faster :

-1493.3797363103401
-1498.9514927344608
-1498.7787351591646
-1511.156509969127
-1515.8570471429125
-1518.1484484660875
-1522.6107165472215
-1523.9138023039504
-1523.9113305755095
-1524.2817735760082

in particular, is there a way to leverage that information (i.e. the knowledge of an order of magnitude for a good step size) in FW ?

I deduce from what you say that the feasible space is somehow rescaled so that gamma lies in [0,1], which explains the default value gamma_max = 1 in the package.

Yes exactly, gamma corresponds to the step from your current iterate to the computed extreme point of the feasible set. \gamma=1 brings the next iterate to the boundary. > 1 would make the next iterate infeasible.

What may happen is that the Frank-Wolfe direction is misaligned with your gradient, hence the small steps. I assume your feasible set is polyhedral? It might be worth trying the other main algorithms (AFW, etc), check out the docs here: Home · FrankWolfe.jl

Yes, the set is polyhedral. I didn’t know that the direction could be misaligned with the gradient (I think that in a basic implementation they coincide up to sign right ?). I’ll look at other FW-like algorithms then and let you know.

The direction v-x is obtained from the LMO, so it is not in general completely colinear with the negative of the gradient. The picture on the first page of the doc is helpful to visualize this. You can also look at the paper in CITATION.bib

1 Like

thanks @mbesancon, indeed looking around at other FW implementations helped to speed up the algorithm, and the role of gamma is now clear.

1 Like