Can I expect ForwardDiff to give the same performance in this case?

Hello!

I just learned about Automatic Differentiation and of course need to give it a go using Julia :slight_smile:

I have a function defined as:

Wendland(q) = aD*(1-q/2)^4 * (2*q+1)

Where aD in my case is 696.

And its derivative w.r.t. q:

WendlandDerivative(q) = aD*((5/8)*q*(q-2)^3)

By using ForwardDiff and Benchmark package I confirm that I get the same answers but vastly different timings:

using BenchmarkTools
using ForwardDiff

aD                    = 696
Wendland(q)           = aD*(1-q/2)^4 * (2*q+1)
WendlandDerivative(q) = aD*((5/8)*q*(q-2)^3)

#Define derivative function using ForwardDiff
df = x -> ForwardDiff.derivative(Wendland, x)

# Benchmarks

@btime WendlandDerivative(u) setup=(u=1.1)
  24.397 ns (2 allocations: 32 bytes)
-348.82649999999984

@btime df(u) setup=(u=1.1)
  121.287 ns (6 allocations: 160 bytes)
-348.8264999999999

Am I doing something wrong; perhaps this is not the intented use case of AD? It perhaps works better on scale (i.e. arrays) than for just producing the derivative function and evaluating it a few times?

Kind regards

The Performance Tips (Performance Tips · The Julia Language) in the Julia docs is a great resource. For example, here you are using non-const globals (aD and df) which are actually the first things the guide suggests to avoid.

However, AD is not always as fast as manual written derivatives. In this case in particular, I think the literal ^3 in the analytical derivative will be unrolled while this will not happen in the AD case.

2 Likes

I don’t think that explains the main part of the difference in performance, but thanks for the heads up.

I see. Will try to see if I can write the function (Wendland) in a way which improves ForwardDiff’s opportunity for getting as close as possible to a well-written derivative.

EDIT: I think I found the solution, and using const plays a huge role for small number of evaluations. Will update post below soon.

Kind regards

I was able to get similar performance by using const and a package called FastPow This is pretty awesome to me that is why I put it in bold :slight_smile:

EDIT: Results of 0.001 ns cannot be trusted as far as I have been told, but included for good measure - exciting to see how FastPow macro makes it go from 6 ns to that for one calculation.

Benchmarks:

# Manual Function Derivative with FastPow
 @btime WendlandDerivative(u) setup=(u=1.1);
  0.001 ns (0 allocations: 0 bytes)
# Automatic Function Derivative without FastPow
@btime dfNOF(u) setup=(u=1.1);
  6.799 ns (0 allocations: 0 bytes)
# Automatic Function Derivative with FastPow
@btime df(u) setup=(u=1.1);
  0.001 ns (0 allocations: 0 bytes)

# Testing on Arrays
const pts = rand(1000);
# Manual
 @btime WendlandDerivative.(u) setup=(u=pts);
  667.785 ns (1 allocation: 7.94 KiB)
# Automatic
@btime df.(u) setup=(u=pts);
  623.333 ns (1 allocation: 7.94 KiB)

Code below:

using BenchmarkTools
using ForwardDiff
using FastPow

const aD = 696

@fastpow Wendland(q) = aD*(1-q/2)^4 * (2*q+1)
WendlandNOF(q) = aD*(1-q/2)^4 * (2*q+1)
@fastpow WendlandDerivative(q) = aD*((5/8)*q*(q-2)^3)

const dfNOF = x -> ForwardDiff.derivative(WendlandNOF, x)
const df    = x -> ForwardDiff.derivative(Wendland, x)

@btime WendlandDerivative(u) setup=(u=1.1);

@btime dfNOF(u) setup=(u=1.1);
@btime df(u) setup=(u=1.1);

const pts = rand(1000);

@btime WendlandDerivative.(u) setup=(u=pts);

@btime df.(u) setup=(u=pts);

Just as a heads up, when you see timings like this it means that Julia managed to “see through” you benchmark code and figures out that you are not doing anything with the result, so it doesn’t do any computation. You can sometimes solve this with a bit of “wrapping”:


julia> @btime WendlandDerivative(1.1) # bad
  0.044 ns (0 allocations: 0 bytes)
-348.82649999999984

julia> @btime WendlandDerivative($(Ref(1.1))[]) # ok
  1.809 ns (0 allocations: 0 bytes)
-348.82649999999984
1 Like

Oh wow, would never have thought of that!

Thanks again and also for the mention of using “const”. Unsure if I mark my own answer as the answer, will wait till tomorrow if someone found a way to do it even better. Doing that the results are:

@btime WendlandDerivative($(Ref(1.1))[])
  1.499 ns (0 allocations: 0 bytes)
-348.82649999999984

@btime df($(Ref(1.1))[])
  1.999 ns (0 allocations: 0 bytes)
-348.8264999999999

Which indicates that for a single evaluation, manual derivative is superior, but it opens up more when adding more elements as seen in previous comment.

I also checked what happened if I put “const” in the Wendland and WendlandDerivative functions for array evaluation, did not seem to play a huge role.

Kind regards