# Can I expect ForwardDiff to give the same performance in this case?

Hello!

I just learned about Automatic Differentiation and of course need to give it a go using Julia

I have a function defined as:

``````Wendland(q) = aD*(1-q/2)^4 * (2*q+1)
``````

Where aD in my case is 696.

And its derivative w.r.t. q:

``````WendlandDerivative(q) = aD*((5/8)*q*(q-2)^3)
``````

By using ForwardDiff and Benchmark package I confirm that I get the same answers but vastly different timings:

``````using BenchmarkTools
using ForwardDiff

#Define derivative function using ForwardDiff
df = x -> ForwardDiff.derivative(Wendland, x)

# Benchmarks

@btime WendlandDerivative(u) setup=(u=1.1)
24.397 ns (2 allocations: 32 bytes)
-348.82649999999984

@btime df(u) setup=(u=1.1)
121.287 ns (6 allocations: 160 bytes)
-348.8264999999999

``````

Am I doing something wrong; perhaps this is not the intented use case of AD? It perhaps works better on scale (i.e. arrays) than for just producing the derivative function and evaluating it a few times?

Kind regards

The Performance Tips (Performance Tips Â· The Julia Language) in the Julia docs is a great resource. For example, here you are using non-const globals (`aD` and `df`) which are actually the first things the guide suggests to avoid.

However, AD is not always as fast as manual written derivatives. In this case in particular, I think the literal `^3` in the analytical derivative will be unrolled while this will not happen in the AD case.

2 Likes

I donâ€™t think that explains the main part of the difference in performance, but thanks for the heads up.

I see. Will try to see if I can write the function (Wendland) in a way which improves ForwardDiffâ€™s opportunity for getting as close as possible to a well-written derivative.

EDIT: I think I found the solution, and using const plays a huge role for small number of evaluations. Will update post below soon.

Kind regards

I was able to get similar performance by using const and a package called FastPow This is pretty awesome to me that is why I put it in bold

EDIT: Results of 0.001 ns cannot be trusted as far as I have been told, but included for good measure - exciting to see how FastPow macro makes it go from 6 ns to that for one calculation.

Benchmarks:

``````# Manual Function Derivative with FastPow
@btime WendlandDerivative(u) setup=(u=1.1);
0.001 ns (0 allocations: 0 bytes)
# Automatic Function Derivative without FastPow
@btime dfNOF(u) setup=(u=1.1);
6.799 ns (0 allocations: 0 bytes)
# Automatic Function Derivative with FastPow
@btime df(u) setup=(u=1.1);
0.001 ns (0 allocations: 0 bytes)

# Testing on Arrays
const pts = rand(1000);
# Manual
@btime WendlandDerivative.(u) setup=(u=pts);
667.785 ns (1 allocation: 7.94 KiB)
# Automatic
@btime df.(u) setup=(u=pts);
623.333 ns (1 allocation: 7.94 KiB)
``````

Code below:

``````using BenchmarkTools
using ForwardDiff
using FastPow

@fastpow Wendland(q) = aD*(1-q/2)^4 * (2*q+1)

const dfNOF = x -> ForwardDiff.derivative(WendlandNOF, x)
const df    = x -> ForwardDiff.derivative(Wendland, x)

@btime WendlandDerivative(u) setup=(u=1.1);

@btime dfNOF(u) setup=(u=1.1);
@btime df(u) setup=(u=1.1);

const pts = rand(1000);

@btime WendlandDerivative.(u) setup=(u=pts);

@btime df.(u) setup=(u=pts);

``````

Just as a heads up, when you see timings like this it means that Julia managed to â€śsee throughâ€ť you benchmark code and figures out that you are not doing anything with the result, so it doesnâ€™t do any computation. You can sometimes solve this with a bit of â€śwrappingâ€ť:

``````
0.044 ns (0 allocations: 0 bytes)
-348.82649999999984

julia> @btime WendlandDerivative(\$(Ref(1.1))[]) # ok
1.809 ns (0 allocations: 0 bytes)
-348.82649999999984
``````
1 Like

Oh wow, would never have thought of that!

Thanks again and also for the mention of using â€śconstâ€ť. Unsure if I mark my own answer as the answer, will wait till tomorrow if someone found a way to do it even better. Doing that the results are:

``````@btime WendlandDerivative(\$(Ref(1.1))[])
1.499 ns (0 allocations: 0 bytes)
-348.82649999999984

@btime df(\$(Ref(1.1))[])
1.999 ns (0 allocations: 0 bytes)
-348.8264999999999
``````

Which indicates that for a single evaluation, manual derivative is superior, but it opens up more when adding more elements as seen in previous comment.

I also checked what happened if I put â€śconstâ€ť in the Wendland and WendlandDerivative functions for array evaluation, did not seem to play a huge role.

Kind regards