Newbie: Gradient of a gradient performance in Zygote

roh_codeur · March 20, 2021, 11:19am

Hi

I am a newbie to Julia. I am trying to implement a piece of code as below. I was wondering if there is a way to improve the performance of gradient of a gradient.

Note: for simplicity, both x and y are the same array, however, in my real case, they are obviously different

thanks!

using Zygote

Sigmoid(x) = 1/(1+exp(-x))
WeightedLoss(x,y) = -1.5 .* y*log(Sigmoid(x)) -1 .* (1-y)*log(1-Sigmoid(x))

#Zygote automatic differentiation
f(x,y) = Zygote.gradient(WeightedLoss,x,y)[1]
g(x,y) = Zygote.gradient(f,x,y)[1]

nb = 1000
arr = map(i->convert(Float64,i),1:nb)
x = arr

@time Grad = f.(x, x)
@time Hess = g.(x, x)

Output:
 0.088668 seconds (248.25 k allocations: 12.296 MiB)
1.245433 seconds (4.04 M allocations: 169.133 MiB, 5.11% gc time)

baggepinnen · March 20, 2021, 12:52pm

Hello and welcome to the community!
You can try the built-in Hessian function
https://fluxml.ai/Zygote.jl/latest/utils/#Zygote.hessian
it uses forward-mode differentiation for the outer Jacobian calculation, which can often be much more performant than reverse over reverse mode AD.
Also, try BenchmarkTools.jl for accurate timings.

roh_codeur · March 21, 2021, 12:10pm

Thanks, this worked. For anyone else running into this, I found the below links useful

https://www.khanacademy.org/math/multivariable-calculus/applications-of-multivariable-derivatives/quadratic-approximations/a/the-hessian

github.com/FluxML/Zygote.jl

Huge memory allocation

opened 12:35AM - 19 Aug 19 UTC

unrealwill

Probably related to #202 and #292 I stumbled upon a very natural but strange… bug and highlight it in the following example. The lazy evaluation of a lambda calling Zygote.gradient, means that the compiler has some issues simplifying the expressions that happen in the same context resulting in extra memory allocations. The work-around is to pass the derivative function as a parameter to another function. ``` import Zygote h1( x ) = x*x function compute(arr, h) nb = length(arr) df = x-> Zygote.gradient(h,x)[1] for i = 1:nb dfi = df(arr[i]) ip = min(i,nb) arr[i] -= dfi arr[ip] += dfi end end function compute2(arr, h) df = x-> Zygote.gradient(h,x)[1] inner(arr,df) end function inner(arr,df) nb = length(arr) for i = 1:nb dfi = df(arr[i]) ip = min(i,nb) arr[i] -= dfi arr[ip] += dfi end end function main() nb = 100000 arr = map(i->convert(Float64,i),1:nb) compute(arr,h1) println("benchmarking compute : allocate extra memory") @time compute(arr,h1) compute2(arr,h1) println("benchmarking compute2 : works fine") @time compute2(arr,h1) end main() ``` > benchmarking compute : allocate extra memory > 0.013546 seconds (798.98 k allocations: 12.191 MiB, 25.74% gc time) > benchmarking compute2 : works fine > 0.000160 seconds

Topic		Replies	Views
Compute gradient of gradient norm using zygote New to Julia zygote	17	2022	August 26, 2022
Zygote dozens* of times slower than manually written function Performance zygote , forwarddiff	17	1769	April 21, 2022
Gradient of Gradient in Zygote General Usage	2	2736	January 1, 2021
Different results between Zygote, ForwardDiff, and ReverseDiff New to Julia	11	3440	October 12, 2020
Zygote Performance Machine Learning question	22	4980	September 23, 2019

Newbie: Gradient of a gradient performance in Zygote

Related topics