I’m seeing incorrectly computed gradients when I use a vector of matrices as a model in Flux. The implicit gradients compute to 0 but the structural gradients seem to be computed correctly.
MWE:
using Flux
struct VecofMat{T}
W::T
end
Flux.@functor VecofMat
function (model::VecofMat)(x)
mapreduce(m->m*x, +, model.W)
end
function implicit_grads(model)
x = rand(2)
gs = gradient(()->sum(model(x)), params(model))
return gs.grads
end
function structural_grads(model)
x = rand(2)
gs = gradient((model)->sum(model(x)), model)
return gs[1]
end
model = VecofMat([rand(2,2) for i = 1:3])
@show implicit_grads(model)
@show structural_grads(model)
Ouputs:
julia> implicit_grads(model)
IdDict{Any,Any} with 3 entries:
[0.717542 0.507439; 0.662728 0.493812] => nothing
[0.888345 0.512297; 0.347085 0.122115] => nothing
[0.801164 0.874216; 0.394027 0.199299] => nothing
julia> structural_grads(model)
(W = [[0.8304204082278925 0.9041784786828033; 0.8304204082278925 0.9041784786828033], [0.8304204082278925 0.9041784786828033; 0.8304204082278925 0.9041784786828033], [0.8304204082278925 0.9041784786828033; 0.8304204082278925 0.9041784786828033]],)
These issues disappear when I use a multi-dimensional array instead of a vector of matrices. But I just wanted to know if I’m making a mistake somewhere when I’m computing the implicit gradients. Thanks!