The following two lines of code are commonly used when training a model in Julia.
∇model = gradient(m → loss(m, x, y), model)[1]
opt_state, model = Optimisers.update!(opt_state, model, ∇model)
For debugging pursposes, we can do statistics of model parameters via the fmap and fmapstructure functions from Functors.
For example, I can see the average values of parameters (for each weight matrix and for each bias in all Dense layers) using the following very generic line of code:
stats = fmapstructure(mean, model, exclude=x->x isa Union{Array, CUDA.CuArray} && Functors.isleaf(x))
Output:
(model = (layers = ((weight = -0.00033604607f0, bias = 1.9988879f0, σ = ()), (weight = -0.0045190137f0, bias = -0.0030693f0, σ = ()), (weight = -0.002681108f0, bias = -0.0048592985f0, σ = ()), (weight = -0.015205776f0, bias = -0.012156701f0,
σ = ()), (paths = NamedTuple{(:weight, :bias, :σ), Tuple{Float32, Float32, Tuple{}}}[(weight = 0.021802468, bias = -0.001723722, σ = ()), (weight = 0.055412512, bias = -0.0015859936, σ = ()), (weight = -0.05021051, bias = 0.0025667339, σ = ()), (weight = -0.04975954, bias = 0.0021612642, σ = ()), (weight = 0.034353513, bias = -0.002381153, σ = ()), (weight = -0.021517549, bias = 0.0015386648, σ = ())],)),),)
Question:
I would like to do the ratio of model/∇model
because I want to see how the gradients are related to the original parameters. After having the ratio, we can get the mean, min and max ratio in each layer. Now the thing is how can we generically parse the model and ∇model at the same time?
I know Optimisers do something similar so maybe @mcabbott can help.