Error using Enzyme to autodiff wrt Flux neural net parameters

mcabbott · December 18, 2024, 5:22am

re is rebuilding a nested structure from a Vector v by doing ynew = reshape(v[o .+ (1:length(y))], axes(y)) repeatedly, with known offsets o. Here it seems y is const, while v and hence ynew are active. That part seems like it ought to be fine?

The “repeatedly” is some recursive walk by Functors.jl, which keeps an IdDict of what’s seen, and I believe here this will be indexed by y and some ynew may be taken from this. That sounds like it could be awful?

I’m surprised you say it this way around. In low-tech terms, you can’t store a Dual(1.0,1) in [1.0, 2.0], but you can store 4.0 in [Dual(5.0, 1), Dual(6.0, 0)]. So I’m surprised that this fails:

julia> Enzyme.gradient(Reverse, (x,y) -> sum(vcat(x,y)), [1.0, 2.0], Const([3.0]))
ERROR: Constant memory is stored (or returned) to a differentiable variable.

And as a result, not surprised that the following fmap example fails, it’s not testing the IdDict or anything just vcat…

julia> using Enzyme, Functors

julia> alpha = (a=[1,2.], b=[3.]); beta = (a=[4,5.], b=[6.]);

julia> fmap(vcat, alpha, beta)  # this is what fmap does
(a = [1.0, 2.0, 4.0, 5.0], b = [3.0, 6.0])

julia> Enzyme.gradient(Reverse, (α,β) -> sum(fmap(vcat,α,β).a), alpha, beta)
((a = [1.0, 1.0], b = [0.0]), (a = [1.0, 1.0], b = [0.0]))

julia> Enzyme.gradient(Reverse, (α,β) -> sum(fmap(vcat,α,β).a), alpha, Const(beta))
ERROR: Constant memory is stored (or returned) to a differentiable variable.

When you say “non-differentiable matrix into a larger data structure”, for what kinds of structure is this a concern? I can happily make some objects which seem to be partly const:

julia> Enzyme.gradient(Reverse, (a,b) -> begin nt = (; a, b, c=sum(a.*b)); sum(nt.a) / nt.c end, [1.0, 20], Const([3.0, 4.0]))
([0.0029031789809841786, -0.0001451589490492084], nothing)

That’s what the function _trainmap does… it’s taking all the children of something (a NamedTuple), and a subset (represented as a similar NamedTuple with some nothing entries, the others like y above) and some aux info (the offsets above) and building another NamedTuple with either ynew, or the original child. And this seems to be no problem:

julia> function _trainmap(f, ch, tr, aux)
         map(ch, tr, aux) do c, t, a  # isnothing(t) indicates non-trainable field
           isnothing(t) ? c : f(t, a)
         end
       end

julia> _trainmap(getindex, ([1.0], [2.0, 3.0]), (nothing, [2.0, 3.0]), (99:101, 2:2))
([1.0], [3.0])

julia> Enzyme.gradient(Reverse, (ch, tr, aux) -> sum(sum, _trainmap(getindex, ch, tr, aux)), Const(([1.0], [2.0, 3.0])), (nothing, [2.0, 3.0]), Const((99:101, 2:2)))
(nothing, (nothing, [0.0, 1.0]), nothing)

Topic		Replies	Views
Autodiff of vector inputs with Enzyme.jl (and possibly Optimization.jl) Optimization (Mathematical) question , optimization , sciml , enzyme	9	1132	August 16, 2023
What's the state of Automatic Differentiation in Julia January 2023? General Usage autodiff	41	12434	June 21, 2024
Reliability of Enzyme.jl General Usage question , diffeq , autodiff	11	2114	October 23, 2022
Enzyme with Const() on a vector throws an error Optimization (Mathematical) autodiff , enzyme	20	537	April 10, 2025
Lux, ComponentArrays and flat parameters : computing the gradient works with Zygote but not with Enzyme New to Julia enzyme	16	1785	May 14, 2024

Error using Enzyme to autodiff wrt Flux neural net parameters

Related topics