Automatic differentiation performance & computing derivatives of only a subset of the arguments

gianmariomanca · October 1, 2021, 9:05pm

Here is a minimal working example

const x = -10:10

f(m) = exp.(-1.5 .* (x .- m).^2)

function rs(d,xs)
	d0 = f(xs)
	d .- sum(d .* d0) / sum(d0 .* d0) .* d0
end

drs(d,xs) = (rs(d,xs + 1e-6) .- rs(d,xs)) / 1e-6

drs_auto(d,x) = Zygote.jacobian(rs, d,x)[2]

drs_auto_v2(d,x) = Zygote.jacobian(z -> rs(d,z),x)[1]

drs_auto_v3(d,x) = ForwardDiff.derivative(z -> rs(d,z),x)

and I get

and

Zygote is >50 times slower than the numerical derivative, ForwardDiff is 2 times faster.

I know that Zygote is reverse mode, but is that the reason of the performance difference? Can the Zygote call be improved?

Notice that I only need the derivatives of rs with respect to x, I do not need to compute the derivatives with respect to d. Is there any way to tell that to Zygote? And would that matter? Using the anonymous function was just my quick failed attempt.

longemen3000 · October 1, 2021, 9:27pm

on the function performance, can you replace sum(d .* d0) for dot(d,d0) ? (you have to use using LinearAlgebra first).
on the performance of AD, as a general rule of thumb, forward AD is faster than reverse AD at small sizes

gianmariomanca · October 1, 2021, 9:41pm

I was using dot(a,b) before, identical benchmarks as sum( a .* b) in this simple case.

I just left sum() to avoid another dependency for the MWE.

I’m just really curious about the expected behavior of Zygote in this case and the best way to ignore some inputs. To make sure I’m using the packages at their max potential.

ToucheSir · October 1, 2021, 11:48pm

I wonder if the culprit isn’t what inputs are used, but that Zygote un-fuses broadcasts (hence why it isn’t ideal for this kind of highly scalarized code).

mcabbott · October 2, 2021, 1:31am

There’s an algorithmic thing here too. The best case for reverse mode (like Zygote) is many parameters and scalar output. Then it does 1 reverse pass of (ideally) comparable difficulty to the original function. The best case for forward mode is what you have here (if I read this correctly), one scalar leading to a vector output. Again it does the original work plus tracking this one perturbation forwards.

For many outputs, Zygote needs a whole reverse pass per element. So the completely ideal expectation would be that drs_auto and drs_auto_v2 are 20 times slower than drs_auto_v3. In addition reverse mode is just more complicated, which could well be the remaining factor of 5.

drs_auto_v2 won’t save much – Zygote will still work backwards most of the way,

help?> Zygote.jacobian
  jacobian(f, args...) -> Tuple

...
  This reverse-mode Jacobian needs to evaluate the pullback once for each element of y. Doing so
  is usually only efficient when length(y) is small compared to length(a), otherwise forward mode
  is likely to be better.

ToucheSir · October 2, 2021, 5:02pm

Ah right, I ignored that this was a jacobian calculation and not a simple scalar output.

gianmariomanca · October 2, 2021, 8:46pm

As an experiment we can make it a scalar output by defining

function rs_scalar(d,xs)
	t = rs(d,xs)
	sum(t .* t)
end

and define

d_rs_scalar(d,x) =  (rs_scalar(d,x + 1e-6) - rs_scalar(d,x)) / 1e-6 
drs_scalar_auto_v2(d,x) = Zygote.gradient(z -> rs_scalar(d,z),x)[1]
drs_scalar_auto_v3(d,x) = ForwardDiff.derivative(z -> rs_scalar(d,z),x)

you still get that Zygote is 8 times slower than ForwardDiff

Topic		Replies	Views
Zygote dozens* of times slower than manually written function Performance zygote , forwarddiff	17	1777	April 21, 2022
Zygote very slow relative to ForwardDiff? Performance	1	499	June 15, 2022
What is the difference between Zygote vs ForwardDiff and ReverseDiff Machine Learning	4	6573	February 23, 2021
Zygote vs. Forward Diff with Optim Machine Learning performance , optim , zygote , forwarddiff	4	619	March 24, 2023
Understanding the performance of Zygote Machine Learning question , differentiation , zygote	1	775	February 18, 2021

Automatic differentiation performance & computing derivatives of only a subset of the arguments

Related topics