Second derivative in Zygote returns nothing

I need to optimize a function that depends on the gradient of another function. When I run the following code:

using Zygote

data = rand(1,10)
f(p,x) = [x*p[1];x*p[2].+p[3]]
truef(x) = [x.+1;2*x]

function loss(p)
        dndts = [truef(x[1]) for x in data]
        dfdt(x) = Zygote.forwarddiff(x->f(p,x),x)

        ℓ = sum(abs2,reduce(hcat,[dfdt(data[i]).-dndts[i] for i in 1:10]))
        return ℓ
end

println(Zygote.gradient(loss,rand(3)))

I get (nothing,). If I take the gradient inside the function out, something like this:

using Zygote

data = rand(1,10)
f(p,x) = [x*p[1];x*p[2].+p[3]]
truef(x) = [x.+1;2*x]

function loss(p)
        dndts = [truef(x[1]) for x in data]
        dfdt(x) = f(p,x)

        ℓ = sum(abs2,reduce(hcat,[dfdt(data[i]).-dndts[i] for i in 1:10]))
        return ℓ
end

println(Zygote.gradient(loss,rand(3)))

it works correctly. I know second derivatives are complicated, so I was wondering what would be an appropriate solution for this. If it’s of any use:

  • I’m trying to do this where my function f is a neural network, using either the Lux or Flux environment
  • My function, just like in this MWE, is a single input function that returns several outputs.

Note that Zygote.forwarddiff doesn’t take a derivative. And further, it is explicitly blind to anything closed over by the function, i.e. to your p:

help?> Zygote.forwarddiff
  forwarddiff(f, x; chunk_threshold = ForwardDiff.DEFAULT_CHUNK_THRESHOLD) -> f(x)

  Runs f(x) as usual, but instructs Zygote to differentiate f using forward mode, rather than the
  usual reverse mode. 
[...]

  Note that the function f will drop gradients for any closed-over values.

  julia> gradient(2, 3) do a, b
           forwarddiff(a) do a
             a*b
           end
         end
  (3, nothing)

If you want a second derivative, using ForwardDiff outside, and Zygote inside, would probably work here.