I’ve not been actively using Flux.jl for a while and I found that the interface of Flux has changed a bit.
Here is the docs of Flux, “Scheduling Optimisers”.
It seems like that using ParameterSchedulers.jl is recommended.
However, I cannot find a way to set the lower bound of gradient norm in ParameterSchedulers, while the previous version of Flux provided such functionality by default.
I’m not quite sure I understand the question. Are you asking for a way to schedule the δ parameter of Optimisers.ClipGrad (which is equivalent to the legacy Flux.Optimise.ClipValue)? If so, use adjust! as shown here.
As mentioned on the top of that Optimisation Rules page, you should be looking at the Optimisers.jl docs and not that page if you’re working with explicit params. If you consult the adjust! docs I linked above (https://fluxml.ai/Flux.jl/stable/training/reference/#Optimisers.adjust!), that should be enough to show how to do this.
Since this topic has come up again in a different thread, I took the liberty to write out what this would look like. The following is a combination of the Flux quick start and the example in ParameterSchedulers.jl.
using Flux, Optimisers, ParameterSchedulers
noisy = rand(Float32, 2, 1000) # 2×1000 Matrix{Float32}
truth = [xor(col[1]>0.5, col[2]>0.5) for col in eachcol(noisy)] # 1000-element Vector{Bool}
model = Chain(
Dense(2 => 3, tanh), # activation function inside layer
BatchNorm(3),
Dense(3 => 2),
softmax)
target = Flux.onehotbatch(truth, [true, false]) # 2×1000 OneHotMatrix
loader = Flux.DataLoader((noisy, target) |> gpu, batchsize=64, shuffle=true);
const lr = 0.01
optim = Flux.setup(Flux.Adam(lr), model) # setup optimizer as usual
sched = Stateful(Step(lr, 0.9, 100)) # setup schedule of your choice
for epoch in 1:1_000
for (x, y) in loader
loss, grads = Flux.withgradient(model) do m
y_hat = m(x)
Flux.crossentropy(y_hat, y)
end
Flux.update!(optim, model, grads[1])
# NEW
nextlr = ParameterSchedulers.next!(sched) # advance schedule
Optimisers.adjust!(optim, nextlr) # update optimizer state, by default this changes the learning rate `eta`
end
end