Hessian inside a Flux loss function

arlk · February 25, 2021, 7:46pm

MWE (or as minimal as I can think to make it):

using Flux
import ForwardDiff, Zygote

g = Dense(2, 1, softplus)
loss(m, x) = sum(abs2, Zygote.hessian(x->sum(m(x)), x))
gradient(g, rand(2)) do m, x
   loss(m, x)
end

Both Zygote.hessian and ForwardDiff.hessian throw out the following error

ERROR: LoadError: Mutating arrays is not supported
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] (::Zygote.var"#399#400")(::Nothing) at /home/arun/.julia/packages/Zygote/IsBxF/src/lib/array.jl:58
 [3] (::Zygote.var"#2265#back#401"{Zygote.var"#399#400"})(::Nothing) at /home/arun/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59
 [4] forward_jacobian at /home/arun/.julia/packages/Zygote/IsBxF/src/lib/forward.jl:25 [inlined]
 [5] (::typeof(∂(forward_jacobian)))(::Tuple{Nothing,Array{Float64,2}}) at /home/arun/.julia/packages/Zygote/IsBxF/src/compiler/interface2.jl:0
 [6] forward_jacobian at /home/arun/.julia/packages/Zygote/IsBxF/src/lib/forward.jl:38 [inlined]
 [7] (::typeof(∂(forward_jacobian)))(::Tuple{Nothing,Array{Float64,2}}) at /home/arun/.julia/packages/Zygote/IsBxF/src/compiler/interface2.jl:0
 [8] hessian_dual at /home/arun/.julia/packages/Zygote/IsBxF/src/lib/grad.jl:74 [inlined]
 [9] (::typeof(∂(hessian_dual)))(::Array{Float64,2}) at /home/arun/.julia/packages/Zygote/IsBxF/src/compiler/interface2.jl:0
 [10] hessian at /home/arun/.julia/packages/Zygote/IsBxF/src/lib/grad.jl:72 [inlined]
 [11] loss at /tmp/test.jl:5 [inlined]
 [12] (::typeof(∂(loss)))(::Float64) at /home/arun/.julia/packages/Zygote/IsBxF/src/compiler/interface2.jl:0
 [13] #67 at /tmp/test.jl:7 [inlined]
 [14] (::Zygote.var"#41#42"{typeof(∂(#67))})(::Float64) at /home/arun/.julia/packages/Zygote/IsBxF/src/compiler/interface.jl:41
 [15] gradient(::Function, ::Dense{typeof(softplus),Array{Float32,2},Array{Float32,1}}, ::Vararg{Any,N} where N) at /home/arun/.julia/packages/Zygote/IsBxF/src/compiler/interface.jl:59
 [16] top-level scope at /tmp/test.jl:6
 [17] include(::String) at ./client.jl:457
 [18] top-level scope at REPL[9]:1
in expression starting at /tmp/test.jl:6

Any ideas how I can go about solving this issue? Thanks!

mcabbott · February 25, 2021, 8:11pm

Zygote.hessian uses Zygote then ForwardDiff.

Zygote.hessian_reverse uses Zygote twice. You can try taking the 3rd derivative using ForwardDiff on the result of that, although whether this can work beyond toy examples you’ll have to see:

julia> g = x -> exp.(ones(2,2) * x);

julia> loss(m, x) = sum(abs2, Zygote.hessian_reverse(x->sum(m(x)), x));

julia> ForwardDiff.gradient(rand(2)) do x
         loss(g, x)
       end
2-element Vector{Float64}:
 107.37249350506983
 107.37249350506983

arlk · February 25, 2021, 10:08pm

Thanks! This does work!

But how can I go about using ForwardDiff.gradients w.r.t the parameters of the network to train? i.e. I can’t seem to do something like this (that Flux would normally allow)

gs = ForwardDiff.gradient(params(g)) do
   ...
   loss(g, x)
end

arlk · February 26, 2021, 4:21pm

Just realized that we can use Flux.destructure to get a function of the network based on the parameters.

Topic		Replies	Views
(yet another) Zygote: mutating arrays is not supported Machine Learning	10	1498	July 7, 2021
Errors when trying to compute hessian of flux neural net and jacobian of jacobian with zygote Machine Learning flux , zygote	1	375	July 4, 2022
Mutation error in Zygote hessian Machine Learning question , package	2	184	June 22, 2023
Hessian matrix of ML model General Usage flux , zygote , forwarddiff , reversediff	9	2242	April 28, 2021
Calculating hessian of a NN w.r.t params New to Julia question , zygote	0	344	February 24, 2021

Hessian inside a Flux loss function

Related topics