Manually updating the parameters of a Neural Network in Flux

mlpeschl · June 15, 2019, 3:46pm

I am currently working on a Reinforcement Learning problem, which requires to form and store (matrices) of gradients and then update the parameters of the Neural Network manually.
Basically, what I have done is take the gradient of a loss function with respect to the parameters of a Neural Network and then reshape this gradient to a column vector (which needed to be done for my purposes). I redo this a couple of times and end up with a vector w that I want to update, i.e. my update rule is

θ = θ + α*w

where θ denote the parameters of the network. Due to the way Flux works, the parameters are stored in some kind of special structure, so I do not necessarily have acess to the parameters as a single vector.

The way I obained the gradients as a vector is:

function gradcollector(gs)
    gradients = []
    for i in 1:length(collect(gs.grads))
        push!(gradients,collect(gs.grads)[i][2])
    end
    return collect(Iterators.flatten(gradients))
end

where gs is obtained by doing something like

ps = Flux.params(model)

gs = Flux.gradient(() -> loss(input), ps)

I tried to manually reshape w into the form that the parameters have, by iterating through the shapes of the gradients and then reshaping each respective bit into a matrix, but I cannot even do that, since the order in which the parameter matrices are stored is different than the order which I obtain by performing a gradient.

Any help on how to fix this problem would be greatly appreciated.

baggepinnen · June 17, 2019, 6:32am

I am also interested in this, and I believe many others would be as well. I list some packages that target this particular problem below for reference

I might attempt at some implementation of packing/unpacking the Zygote.Grads type, which perhaps would be well received in the Flux repository.

If you made any strides forward on this problem I would be interested in learning from your experience. I will post here once I have a working solution.

baggepinnen · June 17, 2019, 7:09am

Here’s my attempt at the Zygote.Grads type

using Zygote, Test

gradlength(grads) = sum(length(g[1]) for g in grads.grads)

function flatten!(gradvec, grads::Zygote.Grads)
    @assert length(gradvec) == gradlength(grads)
    s = 1
    for g in grads.grads
        l = length(g[2])
        gradvec[s:s+l-1] .= vec(g[2])
        s += l
    end
    gradvec
end

flatten(grads::Zygote.Grads) = flatten!(zeros(gradlength(grads)), grads::Zygote.Grads)

function unflatten!(gradvec, grads::Zygote.Grads)
    s = 1
    for g in grads.grads
        l = length(g[2])
        g[2] .= reshape(gradvec[s:s+l-1], size(g[2]))
        s += l
    end
    grads
end

@test unflatten!(flatten(grads), grads) == grads

Raf · June 17, 2019, 9:32am

This seems like the kind of problem Flatten.jl is designed to solve… However it wont work for arrays as its all compile-time generation and we don’t know the length of most arrays in @generated.

It should be able to generate very fast code for Static arrays if you add the approropriate constructor_of() method.

There are some updates on master, but I’m not sure if they have made it to Pkg yet I’ve been preoccupied.

(Nested is not really a package anymore it got refactored so small I just made it part of Flatten)

Topic		Replies	Views
Flux.params of a matrix implemented as a struct Machine Learning zygote	11	979	May 17, 2021
How to manually update the params in a `Flux.Chain` neural network? General Usage flux , machine-learning	1	979	July 19, 2020
Lux (And Flux), "parallel" Network Input. When Input is flat, Zygote gradient works, when input is not flat it doesn't Machine Learning flux , zygote , lux	10	682	February 5, 2024
Flux/Zygote: Gradient with respect to inputs and implicit parameters (in 2021) Machine Learning question , flux , zygote	1	974	November 23, 2021
Gradient of gradient Machine Learning	9	1264	November 6, 2020

Manually updating the parameters of a Neural Network in Flux

Related topics