Looks like using ReverseDiff over Zygote works, see this discussion post for the discussion. Here is my loss function with using Zygote to get the gradients:
layer(W, b, x) = tanh.(W * x .+ b);
function loss(W, b, x)
u(x) = layer(W, b, x)[1,:]
v(x) = layer(W, b, x)[2,:]
∂u∂x = Zygote.gradient(a -> sum(u(a)), x)[1][1,:]
∂v∂x = Zygote.gradient(a -> sum(v(a)), x)[1][1,:]
mean(∂u∂x.^2) + mean(∂v∂x.^2) #just a simple loss function, not physical
end
Here is the gradient of the loss function with respect to the model weights and biases:
const W, b = rand(2,2), rand(2);
const input = (W, b);
const diff = ReverseDiff.compile(ReverseDiff.GradientTape((a,b) -> loss(a,b,x0_t0), input));
a = ReverseDiff.gradient!(map(copy, input), diff, input);
a[1]
2×2 Array{Float64,2}:
0.118311 0.0
0.126966 0.0