Gradient of sum

T = randn(10,10)
ps = Flux.params(T)
opt=ADAM()
for iter = 1:100
    gs = gradient(ps) do
        sum(T)
    end
    Flux.update!(opt, ps, gs)
end

throws an error:

ArgumentError: Cannot setindex! to 0.0009999999900000003 for an AbstractFill with value 1.0.

Stacktrace:
 [1] setindex! at /home/cossio/.julia/packages/FillArrays/tE9Xq/src/FillArrays.jl:41 [inlined]
 [2] _setindex! at ./abstractarray.jl:1176 [inlined]
 [3] setindex! at ./abstractarray.jl:1153 [inlined]
 [4] macro expansion at ./broadcast.jl:932 [inlined]
 [5] macro expansion at ./simdloop.jl:77 [inlined]
 [6] copyto! at ./broadcast.jl:931 [inlined]
 [7] copyto! at ./broadcast.jl:886 [inlined]
 [8] materialize! at ./broadcast.jl:848 [inlined]
 [9] materialize!(::FillArrays.Fill{Float64,2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2},Nothing,typeof(*),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2},Nothing,typeof(/),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2},Nothing,typeof(/),Tuple{Array{Float64,2},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(-),Tuple{Int64,Float64}}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2},Nothing,typeof(+),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2},Nothing,typeof(sqrt),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2},Nothing,typeof(/),Tuple{Array{Float64,2},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(-),Tuple{Int64,Float64}}}}}},Float64}}}},Float64}}) at ./broadcast.jl:845
 [10] apply!(::ADAM, ::Array{Float64,2}, ::FillArrays.Fill{Float64,2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}}) at /home/cossio/.julia/packages/Flux/05b38/src/optimise/optimisers.jl:177
 [11] update!(::ADAM, ::Array{Float64,2}, ::FillArrays.Fill{Float64,2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}}) at /home/cossio/.julia/packages/Flux/05b38/src/optimise/train.jl:23
 [12] update!(::ADAM, ::Params, ::Zygote.Grads) at /home/cossio/.julia/packages/Flux/05b38/src/optimise/train.jl:29
 [13] top-level scope at In[89]:8
 [14] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091

What is going on here?

1 Like

https://github.com/FluxML/Flux.jl/issues/1510

The issue here is that Flux.apply! assumes gradients to be mutable, but Zygote returned a FillArrays.Fill here, since that can be a lot more efficient in some cases. The easiest solution would be to pass Array.(gs) to Flux.update! instead of gs.

It would be nice to solve this within Flux instead of at the user’s code. Maybe Flux.apply!(...) should dispatch on the type of gs, and exploit the mutability of gs when appropriate.

It’s not quite as straightforward, since the current optimizer API relies heavily on IdDicts, so it might cause wrong results for stateful optimizers with immutable types. The proper fix here is a new optimizer API and then use the explicit instead of the implicit differentiation API of Zygote. See also https://github.com/FluxML/Flux.jl/pull/1481, but there might be more to be done for this to work seamlessly.

1 Like