I am running a DRL algorithm with Adam where I want the learning rate to decay with time. As an example, consider
using Flux, ParameterSchedulers
model = Chain(Dense(5,10,gelu), Dense(10,10,gelu), Dense(10,1,softplus))
ps = Flux.params(model)
# gs is calculated gradient
At the updating step, I want the learning rate to decay, say
sched = Sequence(Exp(λ = 1f-7, γ=1000^(1/100)) => 100, Exp(λ = 1f-4, γ=0.99) => 100)
optimiser = Scheduler(sched, ADAM())
Flux.update!(optimiser, ps, gs)
Unfortunately, this gives the error
ERROR: Optimisers.jl cannot be used with Zygote.jl's implicit gradients, `Params` & `Grads`
I can get Flux.update! to work only if I use
optimiser = Flux.Optimise.ADAM()
but in this case I am unable to use ParameterSchedulers to vary the learning rate.