ForwardDiff.jl has been really well written with a focus on performance, but computing the derivative will always involve somewhat more work than just computing the value of the function. It’s really hard to say how much the slowdown will be, but if your function is scalar → scalar, then I would guess that ForwardDiff.derivative()
should be about 2X slower than just calling the function normally (and when taking gradients of functions with vector inputs, forward-mode autodiff will get linearly slower in the number of inputs to your function).
I would encourage you to just try it out and use the BenchmarkTools.jl
package to see the performance impact.
As for the example I worked on, all I did was try a very simple function which included conv()
:
julia> f(α) = sum(conv(α * [5., 6, 7, 8], [1., 2, 3, 4]))
f (generic function with 1 method)
julia> ForwardDiff.derivative(f, 1.0)
ERROR: MethodError: no method matching conv(::Array{ForwardDiff.Dual{1,Float64},1}, ::Array{Float64,1})
which showed that conv()
wasn’t implemented for the special Dual
type that ForwardDiff uses. On the other hand, a naive implementation of conv() should be pretty easy to write, and should “just work” with Dual inputs.
For example, let’s write a function (not a convolution, but something simpler and representative):
function foo(x, y)
x' * y
end
And maybe the function you’re interested in takes data vectors x and y and a parameter a:
function bar(a, x, y)
foo(a .* x, y)
end
Then, for some given x and y, we can easily take the derivative of bar w.r.t. a, evaluated at a = 1.0:
julia> x = rand(3)
julia> y = rand(3)
julia> ForwardDiff.derivative(a -> bar(a, x, y), 1.0)
1.018944053791661
(note the use of an anonymous function to create a new function of one argument (a) that has the current values of x and y “baked in”).
We can check performance easily:
julia> @benchmark bar($a, $x, $y)
BenchmarkTools.Trial:
memory estimate: 112 bytes
allocs estimate: 1
--------------
minimum time: 50.699 ns (0.00% GC)
median time: 52.859 ns (0.00% GC)
mean time: 58.809 ns (7.41% GC)
maximum time: 1.238 ÎĽs (86.53% GC)
--------------
samples: 10000
evals/sample: 987
julia> @benchmark ForwardDiff.derivative(a -> bar(a, $x, $y), $a)
BenchmarkTools.Trial:
memory estimate: 272 bytes
allocs estimate: 6
--------------
minimum time: 136.237 ns (0.00% GC)
median time: 141.569 ns (0.00% GC)
mean time: 167.207 ns (12.90% GC)
maximum time: 3.976 ÎĽs (89.75% GC)
--------------
samples: 10000
evals/sample: 873
so the derivative is ~2.5x slower, which is pretty close to my wild guess from earlier. Fortunately, that slowdown factor should not depend much on the number of computations you do, only on the number of inputs to your function.