I am a newbie to Julia. I am trying to implement a piece of code as below. I was wondering if there is a way to improve the performance of gradient of a gradient.
Note: for simplicity, both x and y are the same array, however, in my real case, they are obviously different
thanks!
using Zygote
Sigmoid(x) = 1/(1+exp(-x))
WeightedLoss(x,y) = -1.5 .* y*log(Sigmoid(x)) -1 .* (1-y)*log(1-Sigmoid(x))
#Zygote automatic differentiation
f(x,y) = Zygote.gradient(WeightedLoss,x,y)[1]
g(x,y) = Zygote.gradient(f,x,y)[1]
nb = 1000
arr = map(i->convert(Float64,i),1:nb)
x = arr
@time Grad = f.(x, x)
@time Hess = g.(x, x)
Output:
0.088668 seconds (248.25 k allocations: 12.296 MiB)
1.245433 seconds (4.04 M allocations: 169.133 MiB, 5.11% gc time)
Hello and welcome to the community!
You can try the built-in Hessian function https://fluxml.ai/Zygote.jl/latest/utils/#Zygote.hessian
it uses forward-mode differentiation for the outer Jacobian calculation, which can often be much more performant than reverse over reverse mode AD.
Also, try BenchmarkTools.jl for accurate timings.