I was able to get similar performance by using const and a package called FastPow This is pretty awesome to me that is why I put it in bold
EDIT: Results of 0.001 ns cannot be trusted as far as I have been told, but included for good measure - exciting to see how FastPow macro makes it go from 6 ns to that for one calculation.
Benchmarks:
# Manual Function Derivative with FastPow
@btime WendlandDerivative(u) setup=(u=1.1);
0.001 ns (0 allocations: 0 bytes)
# Automatic Function Derivative without FastPow
@btime dfNOF(u) setup=(u=1.1);
6.799 ns (0 allocations: 0 bytes)
# Automatic Function Derivative with FastPow
@btime df(u) setup=(u=1.1);
0.001 ns (0 allocations: 0 bytes)
# Testing on Arrays
const pts = rand(1000);
# Manual
@btime WendlandDerivative.(u) setup=(u=pts);
667.785 ns (1 allocation: 7.94 KiB)
# Automatic
@btime df.(u) setup=(u=pts);
623.333 ns (1 allocation: 7.94 KiB)
Code below:
using BenchmarkTools
using ForwardDiff
using FastPow
const aD = 696
@fastpow Wendland(q) = aD*(1-q/2)^4 * (2*q+1)
WendlandNOF(q) = aD*(1-q/2)^4 * (2*q+1)
@fastpow WendlandDerivative(q) = aD*((5/8)*q*(q-2)^3)
const dfNOF = x -> ForwardDiff.derivative(WendlandNOF, x)
const df = x -> ForwardDiff.derivative(Wendland, x)
@btime WendlandDerivative(u) setup=(u=1.1);
@btime dfNOF(u) setup=(u=1.1);
@btime df(u) setup=(u=1.1);
const pts = rand(1000);
@btime WendlandDerivative.(u) setup=(u=pts);
@btime df.(u) setup=(u=pts);