I am struggling with allocations and performance when calculating Hessians with ForwardDiff.jl. Here’s an MWE using a clean REPL with Julia 1.10.3
julia> using DiffResults, ForwardDiff, StaticArrays, BenchmarkTools
julia> g = r -> (r[1]^2 - 3) * (r[2]^2 - 2);
julia> x = SA_F32[0.5, 2.7];
julia> hres = DiffResults.HessianResult(x);
julia> @btime ForwardDiff.hessian!($hres, $g, $x)
68.948 ns (1 allocation: 80 bytes)
ImmutableDiffResult(-14.547502, (Float32[5.2900004, -14.85], Float32[10.580001 5.4; 5.4 -5.5]))
The allocation is unexpected, and it seems that it happens inside hess = extract_jacobian(T,partials(T,fd2), x)
, where hess
is returned as a Matrix
(instead of an SMatrix
), and then subsequently converted to an SMatrix
. It seems that partials(T, fd2)
is not statically sized, so that the similar
call inside extract_jacobian
is forced to allocate a Matrix
.
Now, I also tried just doing regular non-!
hessian
, and curiously, at first I got zero allocations and 5ns runtime, but for reasons I cannot explain it now takes almost 10^3 times longer and allocates significantly:
julia> @btime ForwardDiff.hessian($g, $x)
2.878 μs (9 allocations: 208 bytes)
2×2 SMatrix{2, 2, Float32, 4} with indices SOneTo(2)×SOneTo(2):
10.58 5.4
5.4 -5.5
Compare this with
1.10.3> @btime ForwardDiff.gradient($g, $x)
3.100 ns (0 allocations: 0 bytes)
This is on Windows with Julia 1.10.3, but it also happened on 1.10.2. I am particularly confused why ForwardDiff.hessian
suddenly jumped from 5ns to 3us, but my actual usecase is the ForwardDiff.hessian!
.
Possibly relevant: Allocation on ForwardDiff + DiffResults + StaticArrays