How to obtain gradients from training a model

model and grads[1] are trees with the same nesting structure, and the same field names. Except that model uses custom structs like Dense, while grads uses anonymous ones, NamedTuples.

Making a smaller example, here is how you can explore the two:

julia> model = Chain(Dense(2=>1), SkipConnection(Dense(1=>1),+))
Chain(
  Dense(2 => 1),                        # 3 parameters
  SkipConnection(
    Dense(1 => 1),                      # 2 parameters
    +,
  ),
)                   # Total: 4 arrays, 5 parameters, 292 bytes.

julia> grads = gradient(m -> sum(abs2, m([1,-1])), model)
((layers = ((weight = Float32[-2.6218274 2.6218274], bias = Float32[-2.6218274], σ = nothing), (layers = (weight = Float32[0.8607899;;], bias = Float32[-1.6526356], σ = nothing), connection = nothing)),),)

julia> model.layers[1]
Dense(2 => 1)       # 3 parameters

julia> model.layers[1].weight  # pressing tab will show you field names as you type
1×2 Matrix{Float32}:
 -0.675822  -0.154963

julia> model.layers[1].bias  # initialised to zero
1-element Vector{Float32}:
 0.0

julia> grads[1].layers[1]  # corresponding to Dense
(weight = Float32[-2.6218274 2.6218274], bias = Float32[-2.6218274], σ = nothing)

julia> grads[1].layers[1].weight
1×2 Matrix{Float32}:
 -2.62183  2.62183

julia> grads[1].layers[1].bias
1-element Vector{Float32}:
 -2.6218274

One catch is that model[2] also works, the same as model.layers[2], but won’t work on the gradient: grads[1][2] is an error. (Indexing a Chain indexes the tuple inside, but won’t work this way on a NamedTuple.)

(They aren’t always strictly trees, the same object can appear twice, but usually they are.)