I am trying to update the training rate during training. I did it with custom training loop like below:
using Flux
using Flux: @epochs
using Flux: Flux.Data.DataLoader
M = 10
N = 15
O = 2
X = repeat(1.0:10.0, outer=(1, N)) #input
Y = repeat(1.0:2.0, outer=(1, N)) #output
data = DataLoader(X,Y, batchsize=5, shuffle=true)
dims = [M, O]
layers = [Dense(dims[i], dims[i+1]) for i in 1:length(dims)-1]
m = Chain(layers...)
L(x, y) = Flux.Losses.mse(m(x), y) #cost function
ps = Flux.params(m) #model parameters
opa = ADAM #optimizer
lr = 0.95 #initial learning rate
function my_custom_train!(loss, ps, data, opa, lr)
local training_loss
for (index, d) in enumerate(data)
gs = gradient(ps) do
training_loss = loss(d...)
return training_loss
end
@show training_loss
opt = opa(lr/index) #updating learning rate during training iteration
Flux.update!(opt, ps, gs)
end
end
Is there any better way to do the same procedure??
You shouldn’t create the optimizer anew in each iteration, since some optimizers (like Adam) keep an internal state. You should create the optimizer outside the loop, then do opt.eta *= decay_rate inside for exponential decay, opt.eta = opt.eta0 / t etc… for other decays
Think of opa in your first code example as a constructor for an ADAM object/data structure. If you invoke opa repeatedly you are constructing a new ADAM object every time which is slow. Instead of doing that, in your training loop do opt.learning_rate = new_learning_rate (check Flux documentation/code etc to find the actual field name for learning rate), which does not reconstruct the entire ADAM object but only changes a single field value inside an existing one.
Note that this is assuming ADAM as defined in Flux is a mutable struct. If it is not you cannot do something like this. No worries though, because I checked the Flux source code and it is.
For your second question, just calculate the new learning rate in whichever way you want then update the learning rate field in your opt object like above.
The error comes from trying to mix old style optimization rules (Optimiser and ExpDecay) with the new-style Optimisers.jl API Flux favours now. Optimisation Rules · Flux has a warning about this.
If you’re fine with writing a couple more lines to turn that train! call into a custom training loop, then see my runnable example of how to schedule parameters using the new optimization interface + ParameterSchedulers.jl in Learning rate scheduler with the new interface of Flux - #5 by ToucheSir.