Am I using DiffResults.jl correctly?

I’m trying to compute Jacobian and function values simultaneously using ForwardDiff.jl. A simple MWE example is

using StaticArrays
using ForwardDiff
using DiffResults
using BenchmarkTools

f(x) = SVector((@. x[1]*x[2]), x[1]+x[2])
df(x) = ForwardDiff.jacobian(f,x)
df_diffres(x) = ForwardDiff.jacobian!(DiffResults.JacobianResult(x),f,x)

If I benchmark each routine, the evaluation of f and df are both similar cost. However, evaluating df_diffres takes significantly longer than evaluating both the function and Jacobian.

julia> x = SVector(1,2);
julia> @btime f($x)
  0.039 ns (0 allocations: 0 bytes)
julia> @btime df($x)
  0.039 ns (0 allocations: 0 bytes)
julia> @btime df_diffres($x)
  1.214 μs (14 allocations: 608 bytes)

Am I using DiffResults.jl incorrectly, or is there a way to reduce runtimes and allocations?

1 Like

Even with DiffResults you should be caching your config.

1 Like

I tried that previously, but took it off b/c it was slower.

x = SVector(1,2)
cfg = ForwardDiff.JacobianConfig(f,x)
df_diffres(x) = ForwardDiff.jacobian!(DiffResults.JacobianResult(x),f,x,cfg)

gave a timing of 1.401 μs (16 allocations: 736 bytes) for df_diffres(x).

What if you use floating point numbers? Derivatives of integers is somewhat weird…

But I see what your issue is. If you’re using DiffRules with static vectors, of course it needs to allocate it when it stores it. It really only makes sense to define temporary storage with things that would use such storage (i.e. arrays)

1 Like

Thanks for the tip. If I use x = SVector(1.0,2.0), the timings change slightly but the ratios are about the same.

  24.150 ns (1 allocation: 32 bytes) # timing for f
  23.441 ns (1 allocation: 48 bytes) # timing for Jacobian
  161.444 ns (6 allocations: 176 bytes) # timing with DiffResults

On the allocation - that makes sense. Guess I got lucky - this setup is pretty representative of my use case, where a function (with a small number of inputs/outputs) is evaluated at many states using AD.

If everything is small enough for static vectors, then ignore DiffResults and configs. Those are for larger systems.

Ah, thanks! So if I’m understanding correctly - I should expect computing Jacobian and function values simultaneously using ForwardDiff.jl/DiffResults.jl to be faster than computing them separately, but only for a large enough number of inputs/outputs?

It always computes it simultaneously: forward-mode AD cannot not do it simultaneously. However, it’s whether it’s stored in an intermediate for having the DiffRules interface: if it’s static vectors, putting it in a mutable type will currently require that it gets heap allocated, which is not great right now. However, that limitation should be lifted in v1.5 IIRC.