# Function in Flux to estimate learning rate

Is there a function in Flux to estimate the best learning rate for a good gradient descent before training a neural network?

No.
Or yes, if you are willing to accept the defaults of the optimizers.
(Which TBH I normally am)
In general this is not possible, sorting this out is the job of hyperparameter optimization.

@oxinabox Itâ€™s inaccurate to state â€śin general this is not possibleâ€ť; of course it is!

There is a fast.ai package (https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10), developed by Jeremy Howard that does exactly this, so I believe it would be in the best interest of Flux users to have such a functionality in the Flux library.

I hoping Flux developers are listening!

That is not what a I would call estimating the learning rate before training the network.
That is changing the learning rate during training.
i.e., learning rate schedualling (and smarter varients there of, maybe)

Which is a different thing.
I assumed you were asking about determining the optimal (initial) learning rate.

Anyway, I am pretty sure flux doesnâ€™t have that yet,
but you can implement it by hand without too much trouble.
Like in this example

Anyway, I agree Flux should have convienence helpers for this.
Particularly for more complicated varients.

1 Like

@ oxinabox I humbly stand corrected!
Thank you for the example!

Doesnâ€™t ADAM automatically work out the learning rate adaptively? So you donâ€™t need to specify the learning rate like in SGD.

1 Like

Yet, the code at this URL: Function in Flux to estimate learning rate
uses Adam as the optimizer, but changes the learning rate anyway here:

``````# If we haven't seen improvement in 5 epochs, drop our learning rate:
if epoch_idx - last_improvement >= 5 && opt.eta > 1e-6
opt.eta /= 10.0
@warn(" -> Haven't improved in a while, dropping learning rate to \$(opt.eta)!")

# After dropping learning rate, give it a few epochs to improve
last_improvement = epoch_idx
end
``````

The reasoning behind doing this can be found at this URL: https://stackoverflow.com/questions/39517431/should-we-do-learningrate-decay-for-adam-optimizer

Doesnâ€™t ADAM automatically work out the learning rate adaptively? So you donâ€™t need to specify the learning rate like in SGD.

My understanding is that this is less important for ADAM, but some papers have shown that it does still help.