I’m trying to take derivative of the jacobian of neural network. But there emerges mutating arrays problem

nn = Chain(Dense(2,10, tanh), Dense(10,1))
println(Flux.params(nn))
jac₁ = x -> ForwardDiff.jacobian(nn, x) # with respect to x
jac₁(rand(2))
grads = Zygote.gradient(()->jac₁(rand(2))[1], params(nn))
grads.grads

I’m not sure this is exactly what you want, but I think things may work better with ForwardDiff over Zygote, the opposite order to what you had. Something like this:

julia> grads = x -> Zygote.gradient(() -> only(nn(x)), params(nn))
#35 (generic function with 1 method)
julia> grads(rand(2))[params(nn)[1]]
10×2 Matrix{Float64}:
0.46342 0.249064
-0.232392 -0.124899
...
julia> ForwardDiff.jacobian(x -> grads(x)[params(nn)[1]], rand(2))
20×2 Matrix{Float64}:
0.715101 -0.00995561
-0.00685393 0.132286
...
# next parameter, you could build up IdDict like Grads() if desired
julia> ForwardDiff.jacobian(x -> grads(x)[params(nn)[2]], rand(2))
10×2 Matrix{Float64}:
-0.00455624 0.00412416
0.0606107 -0.00800904
...

Note that Zygote#master has a jacobian function, which will also work with implicit Params. But ForwardDiff is always going to require explicit input. They way this is done above is, I guess, pretty inefficient, since it re-calculates the whole grads(x) for every parameter.