Error using Enzyme to autodiff wrt Flux neural net parameters

re is rebuilding a nested structure from a Vector v by doing ynew = reshape(v[o .+ (1:length(y))], axes(y)) repeatedly, with known offsets o. Here it seems y is const, while v and hence ynew are active. That part seems like it ought to be fine?

The “repeatedly” is some recursive walk by Functors.jl, which keeps an IdDict of what’s seen, and I believe here this will be indexed by y and some ynew may be taken from this. That sounds like it could be awful?

I’m surprised you say it this way around. In low-tech terms, you can’t store a Dual(1.0,1) in [1.0, 2.0], but you can store 4.0 in [Dual(5.0, 1), Dual(6.0, 0)]. So I’m surprised that this fails:

julia> Enzyme.gradient(Reverse, (x,y) -> sum(vcat(x,y)), [1.0, 2.0], Const([3.0]))
ERROR: Constant memory is stored (or returned) to a differentiable variable.

And as a result, not surprised that the following fmap example fails, it’s not testing the IdDict or anything just vcat…

julia> using Enzyme, Functors

julia> alpha = (a=[1,2.], b=[3.]); beta = (a=[4,5.], b=[6.]);

julia> fmap(vcat, alpha, beta)  # this is what fmap does
(a = [1.0, 2.0, 4.0, 5.0], b = [3.0, 6.0])

julia> Enzyme.gradient(Reverse, (α,β) -> sum(fmap(vcat,α,β).a), alpha, beta)
((a = [1.0, 1.0], b = [0.0]), (a = [1.0, 1.0], b = [0.0]))

julia> Enzyme.gradient(Reverse, (α,β) -> sum(fmap(vcat,α,β).a), alpha, Const(beta))
ERROR: Constant memory is stored (or returned) to a differentiable variable.

When you say “non-differentiable matrix into a larger data structure”, for what kinds of structure is this a concern? I can happily make some objects which seem to be partly const:

julia> Enzyme.gradient(Reverse, (a,b) -> begin nt = (; a, b, c=sum(a.*b)); sum(nt.a) / nt.c end, [1.0, 20], Const([3.0, 4.0]))
([0.0029031789809841786, -0.0001451589490492084], nothing)

That’s what the function _trainmap does… it’s taking all the children of something (a NamedTuple), and a subset (represented as a similar NamedTuple with some nothing entries, the others like y above) and some aux info (the offsets above) and building another NamedTuple with either ynew, or the original child. And this seems to be no problem:

julia> function _trainmap(f, ch, tr, aux)
         map(ch, tr, aux) do c, t, a  # isnothing(t) indicates non-trainable field
           isnothing(t) ? c : f(t, a)
         end
       end

julia> _trainmap(getindex, ([1.0], [2.0, 3.0]), (nothing, [2.0, 3.0]), (99:101, 2:2))
([1.0], [3.0])

julia> Enzyme.gradient(Reverse, (ch, tr, aux) -> sum(sum, _trainmap(getindex, ch, tr, aux)), Const(([1.0], [2.0, 3.0])), (nothing, [2.0, 3.0]), Const((99:101, 2:2)))
(nothing, (nothing, [0.0, 1.0]), nothing)