How to update learning rate during Flux training in a better manner?

Manu_Francis · January 10, 2021, 4:35am

Hi,

I am trying to update the training rate during training. I did it with custom training loop like below:

using Flux
using Flux: @epochs
using Flux: Flux.Data.DataLoader

M = 10
N = 15
O = 2

X = repeat(1.0:10.0, outer=(1, N)) #input
Y = repeat(1.0:2.0, outer=(1, N))  #output 

data =  DataLoader(X,Y, batchsize=5, shuffle=true)

dims = [M, O]
layers = [Dense(dims[i], dims[i+1]) for i in 1:length(dims)-1]
m = Chain(layers...)

L(x, y) = Flux.Losses.mse(m(x), y) #cost function


ps = Flux.params(m) #model parameters



opa = ADAM #optimizer
lr = 0.95 #initial learning rate
function my_custom_train!(loss, ps, data, opa, lr)
  
  local training_loss
  for (index, d) in enumerate(data)
    gs = gradient(ps) do
      training_loss = loss(d...)
      return training_loss
    end
 	@show training_loss
    opt = opa(lr/index) #updating learning rate during training iteration
    Flux.update!(opt, ps, gs)
    

  end
end

Is there any better way to do the same procedure??

CarloLucibello · January 10, 2021, 7:50am

You shouldn’t create the optimizer anew in each iteration, since some optimizers (like Adam) keep an internal state. You should create the optimizer outside the loop, then do opt.eta *= decay_rate inside for exponential decay, opt.eta = opt.eta0 / t etc… for other decays

Manu_Francis · January 10, 2021, 8:11am

@CarloLucibello: Thanks for your reply. Could you please explain little mode?
For example, I have a simple program like below:

M = 10
N = 15
O = 2

X = rand(M,N) #input
Y = rand(O,N) #output 

data =  DataLoader(X,Y, shuffle=true)

m = Chain(Dense(M, O)) #model

L(x, y) = Flux.Losses.mse(m(x), y) #cost function


ps = Flux.params(m) #model parameters

opt = ADAM(0.1) #optimizer

callback() = @show(L(X,Y)) #callback function

Flux.train!(L, ps, data, opt, cb = () -> callback()) #training

I have to update the learning rate for every iteration by a factor of current iteration.

new_learning_rate = current_learning_rate/ iteration

How can I implement this concept?

Thanks and Regards,
Manu

fengkehh · January 10, 2021, 1:54pm

Think of opa in your first code example as a constructor for an ADAM object/data structure. If you invoke opa repeatedly you are constructing a new ADAM object every time which is slow. Instead of doing that, in your training loop do opt.learning_rate = new_learning_rate (check Flux documentation/code etc to find the actual field name for learning rate), which does not reconstruct the entire ADAM object but only changes a single field value inside an existing one.

Note that this is assuming ADAM as defined in Flux is a mutable struct. If it is not you cannot do something like this. No worries though, because I checked the Flux source code and it is.

For your second question, just calculate the new learning rate in whichever way you want then update the learning rate field in your opt object like above.

Manu_Francis · January 10, 2021, 3:38pm

@fengkehh @CarloLucibello . Thanks for yoour replies. Now everything clear

terasakisatoshi · June 3, 2022, 10:56pm

This package might be useful in 2022:

cirobr · December 10, 2023, 12:59pm

With Flux v0.14.7, I am trying the following with error:

trainset = ...
model = ...
lossFunction = ...
modelOptimiser = Flux.Optimiser(Flux.ExpDecay(η, decay, decay_step, clip, start), Flux.Adam())
optimiserState = Flux.Train.setup(modelOptimiser, model)

Flux.train!(model, trainset, optimiserState) do m, X, y
        lossFunction(m(X), y)
end

and the error message from the optimiserState code line, just before Flux.train!:

Flux.setup does not know how to translate this old-style implicit rule to a new-style Optimisers.jl explicit rule

Would appreciate if someone could provide an example of implementing decays. Thanks.

ToucheSir · December 23, 2023, 5:37am

The error comes from trying to mix old style optimization rules (Optimiser and ExpDecay) with the new-style Optimisers.jl API Flux favours now. Optimisation Rules · Flux has a warning about this.

If you’re fine with writing a couple more lines to turn that train! call into a custom training loop, then see my runnable example of how to schedule parameters using the new optimization interface + ParameterSchedulers.jl in Learning rate scheduler with the new interface of Flux - #5 by ToucheSir.

Topic		Replies	Views
Learning rate decay in callback function Machine Learning question , lux	3	487	January 11, 2024
Learning rate scheduler with the new interface of Flux Machine Learning flux	4	1075	December 23, 2023
ParameterSchedulers causing error with Flux.update! Machine Learning	0	89	May 15, 2024
Function in Flux to estimate learning rate Machine Learning	7	1835	March 12, 2019
Implementing the Learn rate scheduling in the NeuralPDE julia package New to Julia question	5	249	November 21, 2023

How to update learning rate during Flux training in a better manner?

Related topics