How to obtain gradients from training a model

mcabbott · September 6, 2023, 1:03am

model and grads[1] are trees with the same nesting structure, and the same field names. Except that model uses custom structs like Dense, while grads uses anonymous ones, NamedTuples.

Making a smaller example, here is how you can explore the two:

julia> model = Chain(Dense(2=>1), SkipConnection(Dense(1=>1),+))
Chain(
  Dense(2 => 1),                        # 3 parameters
  SkipConnection(
    Dense(1 => 1),                      # 2 parameters
    +,
  ),
)                   # Total: 4 arrays, 5 parameters, 292 bytes.

julia> grads = gradient(m -> sum(abs2, m([1,-1])), model)
((layers = ((weight = Float32[-2.6218274 2.6218274], bias = Float32[-2.6218274], σ = nothing), (layers = (weight = Float32[0.8607899;;], bias = Float32[-1.6526356], σ = nothing), connection = nothing)),),)

julia> model.layers[1]
Dense(2 => 1)       # 3 parameters

julia> model.layers[1].weight  # pressing tab will show you field names as you type
1×2 Matrix{Float32}:
 -0.675822  -0.154963

julia> model.layers[1].bias  # initialised to zero
1-element Vector{Float32}:
 0.0

julia> grads[1].layers[1]  # corresponding to Dense
(weight = Float32[-2.6218274 2.6218274], bias = Float32[-2.6218274], σ = nothing)

julia> grads[1].layers[1].weight
1×2 Matrix{Float32}:
 -2.62183  2.62183

julia> grads[1].layers[1].bias
1-element Vector{Float32}:
 -2.6218274

One catch is that model[2] also works, the same as model.layers[2], but won’t work on the gradient: grads[1][2] is an error. (Indexing a Chain indexes the tuple inside, but won’t work this way on a NamedTuple.)

(They aren’t always strictly trees, the same object can appear twice, but usually they are.)

Topic		Replies	Views
How to obtain the gradients of intermediate variables with Flux Machine Learning question , flux	11	1312	March 24, 2022
How to find gradients of weights and bias of multilayer neural network? New to Julia flux	4	643	January 14, 2021
Problems relating to "gradient is nothing" General Usage	11	183	October 29, 2024
Flux.jl: params() and gradient() ocnfusion Machine Learning	4	650	August 23, 2021
Getting gradients with loss using for-loop is slow in Flux.jl Machine Learning	4	273	August 7, 2023

How to obtain gradients from training a model

Related topics