How to update learning rate during Flux training in a better manner?


I am trying to update the training rate during training. I did it with custom training loop like below:

using Flux
using Flux: @epochs
using Flux: Flux.Data.DataLoader

M = 10
N = 15
O = 2

X = repeat(1.0:10.0, outer=(1, N)) #input
Y = repeat(1.0:2.0, outer=(1, N))  #output 

data =  DataLoader(X,Y, batchsize=5, shuffle=true)

dims = [M, O]
layers = [Dense(dims[i], dims[i+1]) for i in 1:length(dims)-1]
m = Chain(layers...)

L(x, y) = Flux.Losses.mse(m(x), y) #cost function

ps = Flux.params(m) #model parameters

opa = ADAM #optimizer
lr = 0.95 #initial learning rate
function my_custom_train!(loss, ps, data, opa, lr)
  local training_loss
  for (index, d) in enumerate(data)
    gs = gradient(ps) do
      training_loss = loss(d...)
      return training_loss
 	@show training_loss
    opt = opa(lr/index) #updating learning rate during training iteration
    Flux.update!(opt, ps, gs)


Is there any better way to do the same procedure??

You shouldn’t create the optimizer anew in each iteration, since some optimizers (like Adam) keep an internal state. You should create the optimizer outside the loop, then do opt.eta *= decay_rate inside for exponential decay, opt.eta = opt.eta0 / t etc… for other decays


@CarloLucibello: Thanks for your reply. Could you please explain little mode?
For example, I have a simple program like below:

M = 10
N = 15
O = 2

X = rand(M,N) #input
Y = rand(O,N) #output 

data =  DataLoader(X,Y, shuffle=true)

m = Chain(Dense(M, O)) #model

L(x, y) = Flux.Losses.mse(m(x), y) #cost function

ps = Flux.params(m) #model parameters

opt = ADAM(0.1) #optimizer

callback() = @show(L(X,Y)) #callback function

Flux.train!(L, ps, data, opt, cb = () -> callback()) #training

I have to update the learning rate for every iteration by a factor of current iteration.

new_learning_rate = current_learning_rate/ iteration

How can I implement this concept?

Thanks and Regards,

Think of opa in your first code example as a constructor for an ADAM object/data structure. If you invoke opa repeatedly you are constructing a new ADAM object every time which is slow. Instead of doing that, in your training loop do opt.learning_rate = new_learning_rate (check Flux documentation/code etc to find the actual field name for learning rate), which does not reconstruct the entire ADAM object but only changes a single field value inside an existing one.

Note that this is assuming ADAM as defined in Flux is a mutable struct. If it is not you cannot do something like this. No worries though, because I checked the Flux source code and it is.

For your second question, just calculate the new learning rate in whichever way you want then update the learning rate field in your opt object like above.

1 Like

@fengkehh @CarloLucibello . Thanks for yoour replies. Now everything clear :slight_smile:

This package might be useful in 2022: