How to update learning rate during Flux training in a better manner?

Hi,

I am trying to update the training rate during training. I did it with custom training loop like below:

using Flux
using Flux: @epochs
using Flux: Flux.Data.DataLoader

M = 10
N = 15
O = 2

X = repeat(1.0:10.0, outer=(1, N)) #input
Y = repeat(1.0:2.0, outer=(1, N))  #output 

data =  DataLoader(X,Y, batchsize=5, shuffle=true)

dims = [M, O]
layers = [Dense(dims[i], dims[i+1]) for i in 1:length(dims)-1]
m = Chain(layers...)

L(x, y) = Flux.Losses.mse(m(x), y) #cost function


ps = Flux.params(m) #model parameters



opa = ADAM #optimizer
lr = 0.95 #initial learning rate
function my_custom_train!(loss, ps, data, opa, lr)
  
  local training_loss
  for (index, d) in enumerate(data)
    gs = gradient(ps) do
      training_loss = loss(d...)
      return training_loss
    end
 	@show training_loss
    opt = opa(lr/index) #updating learning rate during training iteration
    Flux.update!(opt, ps, gs)
    

  end
end

Is there any better way to do the same procedure??

You shouldn’t create the optimizer anew in each iteration, since some optimizers (like Adam) keep an internal state. You should create the optimizer outside the loop, then do opt.eta *= decay_rate inside for exponential decay, opt.eta = opt.eta0 / t etc… for other decays

2 Likes

@CarloLucibello: Thanks for your reply. Could you please explain little mode?
For example, I have a simple program like below:

M = 10
N = 15
O = 2

X = rand(M,N) #input
Y = rand(O,N) #output 

data =  DataLoader(X,Y, shuffle=true)

m = Chain(Dense(M, O)) #model

L(x, y) = Flux.Losses.mse(m(x), y) #cost function


ps = Flux.params(m) #model parameters

opt = ADAM(0.1) #optimizer

callback() = @show(L(X,Y)) #callback function

Flux.train!(L, ps, data, opt, cb = () -> callback()) #training

I have to update the learning rate for every iteration by a factor of current iteration.

new_learning_rate = current_learning_rate/ iteration

How can I implement this concept?

Thanks and Regards,
Manu

Think of opa in your first code example as a constructor for an ADAM object/data structure. If you invoke opa repeatedly you are constructing a new ADAM object every time which is slow. Instead of doing that, in your training loop do opt.learning_rate = new_learning_rate (check Flux documentation/code etc to find the actual field name for learning rate), which does not reconstruct the entire ADAM object but only changes a single field value inside an existing one.

Note that this is assuming ADAM as defined in Flux is a mutable struct. If it is not you cannot do something like this. No worries though, because I checked the Flux source code and it is.

For your second question, just calculate the new learning rate in whichever way you want then update the learning rate field in your opt object like above.

1 Like

@fengkehh @CarloLucibello . Thanks for yoour replies. Now everything clear :slight_smile:

This package might be useful in 2022:

2 Likes

With Flux v0.14.7, I am trying the following with error:

trainset = ...
model = ...
lossFunction = ...
modelOptimiser = Flux.Optimiser(Flux.ExpDecay(η, decay, decay_step, clip, start), Flux.Adam())
optimiserState = Flux.Train.setup(modelOptimiser, model)

Flux.train!(model, trainset, optimiserState) do m, X, y
        lossFunction(m(X), y)
end

and the error message from the optimiserState code line, just before Flux.train!:

Flux.setup does not know how to translate this old-style implicit rule to a new-style Optimisers.jl explicit rule

Would appreciate if someone could provide an example of implementing decays. Thanks.

1 Like

The error comes from trying to mix old style optimization rules (Optimiser and ExpDecay) with the new-style Optimisers.jl API Flux favours now. Optimisation Rules · Flux has a warning about this.

If you’re fine with writing a couple more lines to turn that train! call into a custom training loop, then see my runnable example of how to schedule parameters using the new optimization interface + ParameterSchedulers.jl in Learning rate scheduler with the new interface of Flux - #5 by ToucheSir.

2 Likes