Incorrect (implicit) gradients for a custom model in Flux

arlk · March 10, 2021, 6:39pm

I’m seeing incorrectly computed gradients when I use a vector of matrices as a model in Flux. The implicit gradients compute to 0 but the structural gradients seem to be computed correctly.
MWE:

using Flux

struct VecofMat{T}
    W::T
end

Flux.@functor VecofMat

function (model::VecofMat)(x)
    mapreduce(m->m*x, +, model.W)
end

function implicit_grads(model)
    x = rand(2)
    gs = gradient(()->sum(model(x)), params(model))
    return gs.grads
end

function structural_grads(model)
    x = rand(2)
    gs = gradient((model)->sum(model(x)), model)
    return gs[1]
end

model = VecofMat([rand(2,2) for i = 1:3])
@show implicit_grads(model)
@show structural_grads(model)

Ouputs:

julia> implicit_grads(model)
IdDict{Any,Any} with 3 entries:
  [0.717542 0.507439; 0.662728 0.493812] => nothing
  [0.888345 0.512297; 0.347085 0.122115] => nothing
  [0.801164 0.874216; 0.394027 0.199299] => nothing

julia> structural_grads(model)
(W = [[0.8304204082278925 0.9041784786828033; 0.8304204082278925 0.9041784786828033], [0.8304204082278925 0.9041784786828033; 0.8304204082278925 0.9041784786828033], [0.8304204082278925 0.9041784786828033; 0.8304204082278925 0.9041784786828033]],)

These issues disappear when I use a multi-dimensional array instead of a vector of matrices. But I just wanted to know if I’m making a mistake somewhere when I’m computing the implicit gradients. Thanks!

CarloLucibello · March 11, 2021, 6:51am

Seems like a nasty bug, you should file an issue in Flux. It can be reduced a little bit more

using Flux

struct VecofMat{T}
    W::T
end

Flux.@functor VecofMat

function implicit_grads(model)
    gs = gradient(() -> sum(model.W[1]), params(model))
    return gs.grads
end

function structural_grads(model)
    gs = gradient( model -> sum(model.W[1]), model)
    return gs[1]
end

model = VecofMat([rand(2,2) for i = 1:3])
@show implicit_grads(model)
@show structural_grads(model)

arlk · March 12, 2021, 7:33pm

Just did. Thanks for taking a look!

Pbellive · March 12, 2021, 10:28pm

Cross referencing the bug report for those who are interested: Implicit params: no gradient for `Array` element of `Vector{AbstractArray}` when parent `Vector` used in AD · Issue #1017 · FluxML/Zygote.jl · GitHub

Topic		Replies	Views
Flux.jl Inconsistent Training on Custom Architecture New to Julia question , flux , autodiff	9	873	June 23, 2021
Error with defining customer gradients in Flux.jl Machine Learning flux	9	1679	November 21, 2018
Flux.params of a matrix implemented as a struct Machine Learning zygote	11	979	May 17, 2021
Gradient and update of custom struct with Flux New to Julia flux	2	1014	August 11, 2020
Another error with defining Flux.jl gradients Machine Learning first-steps , flux	5	1667	December 5, 2018

Incorrect (implicit) gradients for a custom model in Flux

Related topics