 # Automatic differentiation performance & computing derivatives of only a subset of the arguments

Here is a minimal working example

``````const x = -10:10

f(m) = exp.(-1.5 .* (x .- m).^2)

function rs(d,xs)
d0 = f(xs)
d .- sum(d .* d0) / sum(d0 .* d0) .* d0
end

drs(d,xs) = (rs(d,xs + 1e-6) .- rs(d,xs)) / 1e-6

drs_auto(d,x) = Zygote.jacobian(rs, d,x)

drs_auto_v2(d,x) = Zygote.jacobian(z -> rs(d,z),x)

drs_auto_v3(d,x) = ForwardDiff.derivative(z -> rs(d,z),x)
``````

and I get

and

Zygote is >50 times slower than the numerical derivative, ForwardDiff is 2 times faster.

I know that Zygote is reverse mode, but is that the reason of the performance difference? Can the Zygote call be improved?

Notice that I only need the derivatives of rs with respect to x, I do not need to compute the derivatives with respect to d. Is there any way to tell that to Zygote? And would that matter? Using the anonymous function was just my quick failed attempt.

on the function performance, can you replace `sum(d .* d0)` for `dot(d,d0)` ? (you have to use `using LinearAlgebra` first).
on the performance of AD, as a general rule of thumb, forward AD is faster than reverse AD at small sizes

2 Likes

I was using dot(a,b) before, identical benchmarks as sum( a .* b) in this simple case.

I just left sum() to avoid another dependency for the MWE.

I’m just really curious about the expected behavior of Zygote in this case and the best way to ignore some inputs. To make sure I’m using the packages at their max potential.

I wonder if the culprit isn’t what inputs are used, but that Zygote un-fuses broadcasts (hence why it isn’t ideal for this kind of highly scalarized code).

There’s an algorithmic thing here too. The best case for reverse mode (like Zygote) is many parameters and scalar output. Then it does 1 reverse pass of (ideally) comparable difficulty to the original function. The best case for forward mode is what you have here (if I read this correctly), one scalar leading to a vector output. Again it does the original work plus tracking this one perturbation forwards.

For many outputs, Zygote needs a whole reverse pass per element. So the completely ideal expectation would be that `drs_auto` and `drs_auto_v2` are 20 times slower than `drs_auto_v3`. In addition reverse mode is just more complicated, which could well be the remaining factor of 5.

`drs_auto_v2` won’t save much – Zygote will still work backwards most of the way,

``````help?> Zygote.jacobian
jacobian(f, args...) -> Tuple

...
This reverse-mode Jacobian needs to evaluate the pullback once for each element of y. Doing so
is usually only efficient when length(y) is small compared to length(a), otherwise forward mode
is likely to be better.
``````
2 Likes

Ah right, I ignored that this was a jacobian calculation and not a simple scalar output.

As an experiment we can make it a scalar output by defining

``````function rs_scalar(d,xs)
t = rs(d,xs)
sum(t .* t)
end
``````

and define

``````d_rs_scalar(d,x) =  (rs_scalar(d,x + 1e-6) - rs_scalar(d,x)) / 1e-6