Learning rate scheduler with the new interface of Flux

iHany · December 5, 2023, 3:45am

Hi,

I’ve not been actively using Flux.jl for a while and I found that the interface of Flux has changed a bit.

Here is the docs of Flux, “Scheduling Optimisers”.
It seems like that using ParameterSchedulers.jl is recommended.
However, I cannot find a way to set the lower bound of gradient norm in ParameterSchedulers, while the previous version of Flux provided such functionality by default.

What would be the best practice for this?

ToucheSir · December 5, 2023, 3:17pm

I’m not quite sure I understand the question. Are you asking for a way to schedule the δ parameter of Optimisers.ClipGrad (which is equivalent to the legacy Flux.Optimise.ClipValue)? If so, use adjust! as shown here.

iHany · December 6, 2023, 1:21am

@ToucheSir Sorry for the confusion.

I have changed the interface from implicit to explicit (referred to as in Optimisation Rules · Flux).

So, the optimiser is set up using setup.
For example,

opt_state = Flux.setup(optimiser, network)

Here, optimiser is Adam(lr) for the given learning rate lr.

Now, I’d like to use a learning rate scheduler for the optimiser.
For example, I may be able to write a code as described here:

optimiser = Flux.Optimiser(ExpDecay(lr, 0.90, 1000, 1e-5), Adam())

But the setup seems not compatible with the composition of optimisers in the above way. The error message is:

ERROR: Flux.setup does not know how to translate this old-style implicit rule to a new-style Optimisers.jl explicit rule

I’d like to replace optimiser = Flux.Optimiser(ExpDecay(lr, 0.90, 1000, 1e-5), Adam()) with one compatible with the explicit interface.

ToucheSir · December 6, 2023, 1:30am

As mentioned on the top of that Optimisation Rules page, you should be looking at the Optimisers.jl docs and not that page if you’re working with explicit params. If you consult the adjust! docs I linked above (https://fluxml.ai/Flux.jl/stable/training/reference/#Optimisers.adjust!), that should be enough to show how to do this.

ToucheSir · December 23, 2023, 5:34am

Since this topic has come up again in a different thread, I took the liberty to write out what this would look like. The following is a combination of the Flux quick start and the example in ParameterSchedulers.jl.

using Flux, Optimisers, ParameterSchedulers


noisy = rand(Float32, 2, 1000)                                    # 2×1000 Matrix{Float32}
truth = [xor(col[1]>0.5, col[2]>0.5) for col in eachcol(noisy)]   # 1000-element Vector{Bool}

model = Chain(
    Dense(2 => 3, tanh),   # activation function inside layer
    BatchNorm(3),
    Dense(3 => 2),
    softmax)

target = Flux.onehotbatch(truth, [true, false])                   # 2×1000 OneHotMatrix
loader = Flux.DataLoader((noisy, target) |> gpu, batchsize=64, shuffle=true);

const lr = 0.01
optim = Flux.setup(Flux.Adam(lr), model)  # setup optimizer as usual
sched = Stateful(Step(lr, 0.9, 100)) # setup schedule of your choice

for epoch in 1:1_000
    for (x, y) in loader
        loss, grads = Flux.withgradient(model) do m
            y_hat = m(x)
            Flux.crossentropy(y_hat, y)
        end
        Flux.update!(optim, model, grads[1])

        # NEW
        nextlr = ParameterSchedulers.next!(sched) # advance schedule
        Optimisers.adjust!(optim, nextlr) # update optimizer state, by default this changes the learning rate `eta`
    end
end

Topic		Replies	Views
ParameterSchedulers causing error with Flux.update! Machine Learning	0	89	May 15, 2024
How to update learning rate during Flux training in a better manner? New to Julia flux	7	2398	December 23, 2023
Function in Flux to estimate learning rate Machine Learning	7	1835	March 12, 2019
Learning rate decay in callback function Machine Learning question , lux	3	487	January 11, 2024
Implementing the Learn rate scheduling in the NeuralPDE julia package New to Julia question	5	249	November 21, 2023

Learning rate scheduler with the new interface of Flux

Related topics