Autograd through loss function with derivatives in

Looks like using ReverseDiff over Zygote works, see this discussion post for the discussion. Here is my loss function with using Zygote to get the gradients:

layer(W, b, x) = tanh.(W * x .+ b);
function loss(W, b, x)
    u(x) = layer(W, b, x)[1,:]
    v(x) = layer(W, b, x)[2,:]
    ∂u∂x = Zygote.gradient(a -> sum(u(a)), x)[1][1,:]
    ∂v∂x = Zygote.gradient(a -> sum(v(a)), x)[1][1,:]
    mean(∂u∂x.^2) + mean(∂v∂x.^2) #just a simple loss function, not physical
end

Here is the gradient of the loss function with respect to the model weights and biases:

const W, b = rand(2,2), rand(2);
const input = (W, b);
const diff = ReverseDiff.compile(ReverseDiff.GradientTape((a,b) -> loss(a,b,x0_t0), input));
a = ReverseDiff.gradient!(map(copy, input), diff, input);
a[1]
2×2 Array{Float64,2}:
 0.118311  0.0
 0.126966  0.0