Incorrect (implicit) gradients for a custom model in Flux

I’m seeing incorrectly computed gradients when I use a vector of matrices as a model in Flux. The implicit gradients compute to 0 but the structural gradients seem to be computed correctly.
MWE:

using Flux

struct VecofMat{T}
    W::T
end

Flux.@functor VecofMat

function (model::VecofMat)(x)
    mapreduce(m->m*x, +, model.W)
end

function implicit_grads(model)
    x = rand(2)
    gs = gradient(()->sum(model(x)), params(model))
    return gs.grads
end

function structural_grads(model)
    x = rand(2)
    gs = gradient((model)->sum(model(x)), model)
    return gs[1]
end

model = VecofMat([rand(2,2) for i = 1:3])
@show implicit_grads(model)
@show structural_grads(model)

Ouputs:

julia> implicit_grads(model)
IdDict{Any,Any} with 3 entries:
  [0.717542 0.507439; 0.662728 0.493812] => nothing
  [0.888345 0.512297; 0.347085 0.122115] => nothing
  [0.801164 0.874216; 0.394027 0.199299] => nothing

julia> structural_grads(model)
(W = [[0.8304204082278925 0.9041784786828033; 0.8304204082278925 0.9041784786828033], [0.8304204082278925 0.9041784786828033; 0.8304204082278925 0.9041784786828033], [0.8304204082278925 0.9041784786828033; 0.8304204082278925 0.9041784786828033]],)

These issues disappear when I use a multi-dimensional array instead of a vector of matrices. But I just wanted to know if I’m making a mistake somewhere when I’m computing the implicit gradients. Thanks!

Seems like a nasty bug, you should file an issue in Flux. It can be reduced a little bit more

using Flux

struct VecofMat{T}
    W::T
end

Flux.@functor VecofMat

function implicit_grads(model)
    gs = gradient(() -> sum(model.W[1]), params(model))
    return gs.grads
end

function structural_grads(model)
    gs = gradient( model -> sum(model.W[1]), model)
    return gs[1]
end

model = VecofMat([rand(2,2) for i = 1:3])
@show implicit_grads(model)
@show structural_grads(model)
1 Like

Just did. Thanks for taking a look! :slight_smile:

Cross referencing the bug report for those who are interested: Implicit params: no gradient for `Array` element of `Vector{AbstractArray}` when parent `Vector` used in AD · Issue #1017 · FluxML/Zygote.jl · GitHub

2 Likes