Function in Flux to estimate learning rate

samq · March 11, 2019, 12:33pm

Is there a function in Flux to estimate the best learning rate for a good gradient descent before training a neural network?

oxinabox · March 11, 2019, 7:09pm

No.
Or yes, if you are willing to accept the defaults of the optimizers.
(Which TBH I normally am)
In general this is not possible, sorting this out is the job of hyperparameter optimization.

samq · March 11, 2019, 10:09pm

@oxinabox It’s inaccurate to state “in general this is not possible”; of course it is!

There is a fast.ai package (Understanding Learning Rates and How It Improves Performance in Deep Learning | by Hafidz Zulkifli | Towards Data Science), developed by Jeremy Howard that does exactly this, so I believe it would be in the best interest of Flux users to have such a functionality in the Flux library.

I hoping Flux developers are listening!

oxinabox · March 11, 2019, 10:26pm

That is not what a I would call estimating the learning rate before training the network.
That is changing the learning rate during training.
i.e., learning rate schedualling (and smarter varients there of, maybe)

Which is a different thing.
I assumed you were asking about determining the optimal (initial) learning rate.

Anyway, I am pretty sure flux doesn’t have that yet,
but you can implement it by hand without too much trouble.
Like in this example

github.com

FluxML/model-zoo/blob/4ca0c89604ef841b1513174d161196ca5ddce95d/vision/mnist/conv.jl#L114-L120


      
          if epoch_idx - last_improvement >= 5 && opt.eta > 1e-6
              opt.eta /= 10.0
              @warn(" -> Haven't improved in a while, dropping learning rate to $(opt.eta)!")
          
          
    # After dropping learning rate, give it a few epochs to improve
              last_improvement = epoch_idx
          end

Anyway, I agree Flux should have convienence helpers for this.
Particularly for more complicated varients.

samq · March 11, 2019, 10:37pm

@ oxinabox I humbly stand corrected!
Thank you for the example!

xiaodai · March 12, 2019, 4:46am

Doesn’t ADAM automatically work out the learning rate adaptively? So you don’t need to specify the learning rate like in SGD.

samq · March 12, 2019, 6:54pm

@xiaodai You are right about Adam modifying the learning rate adaptively.

Yet, the code at this URL: Function in Flux to estimate learning rate - #5 by samq
uses Adam as the optimizer, but changes the learning rate anyway here:

# If we haven't seen improvement in 5 epochs, drop our learning rate:
    if epoch_idx - last_improvement >= 5 && opt.eta > 1e-6
        opt.eta /= 10.0
        @warn(" -> Haven't improved in a while, dropping learning rate to $(opt.eta)!")

        # After dropping learning rate, give it a few epochs to improve
        last_improvement = epoch_idx
    end

The reasoning behind doing this can be found at this URL: neural network - Should we do learning rate decay for adam optimizer - Stack Overflow

oxinabox · March 12, 2019, 9:36pm

Doesn’t ADAM automatically work out the learning rate adaptively? So you don’t need to specify the learning rate like in SGD.

My understanding is that this is less important for ADAM, but some papers have shown that it does still help.

Topic		Replies	Views
How to update learning rate during Flux training in a better manner? New to Julia flux	7	2398	December 23, 2023
What package allowed setting the learning rate for a minimization problem? General Usage optimization	6	306	April 16, 2023
Learning rate decay in callback function Machine Learning question , lux	3	486	January 11, 2024
Learning rate scheduler with the new interface of Flux Machine Learning flux	4	1075	December 23, 2023
Implementing the Learn rate scheduling in the NeuralPDE julia package New to Julia question	5	249	November 21, 2023

Function in Flux to estimate learning rate

Related topics