ForwardDiff.hessian! with StaticArrays, unexpected allocations and performance

DNF · May 10, 2024, 7:51am

I am struggling with allocations and performance when calculating Hessians with ForwardDiff.jl. Here’s an MWE using a clean REPL with Julia 1.10.3

julia> using DiffResults, ForwardDiff, StaticArrays, BenchmarkTools

julia> g = r -> (r[1]^2 - 3) * (r[2]^2 - 2);

julia> x = SA_F32[0.5, 2.7];

julia> hres = DiffResults.HessianResult(x);

julia> @btime ForwardDiff.hessian!($hres, $g, $x)
  68.948 ns (1 allocation: 80 bytes)
ImmutableDiffResult(-14.547502, (Float32[5.2900004, -14.85], Float32[10.580001 5.4; 5.4 -5.5]))

The allocation is unexpected, and it seems that it happens inside hess = extract_jacobian(T,partials(T,fd2), x), where hess is returned as a Matrix (instead of an SMatrix), and then subsequently converted to an SMatrix. It seems that partials(T, fd2) is not statically sized, so that the similar call inside extract_jacobian is forced to allocate a Matrix .

Now, I also tried just doing regular non-! hessian, and curiously, at first I got zero allocations and 5ns runtime, but for reasons I cannot explain it now takes almost 10^3 times longer and allocates significantly:

julia> @btime ForwardDiff.hessian($g, $x)
  2.878 μs (9 allocations: 208 bytes)
2×2 SMatrix{2, 2, Float32, 4} with indices SOneTo(2)×SOneTo(2):
 10.58   5.4
  5.4   -5.5

Compare this with

1.10.3> @btime ForwardDiff.gradient($g, $x)
  3.100 ns (0 allocations: 0 bytes)

This is on Windows with Julia 1.10.3, but it also happened on 1.10.2. I am particularly confused why ForwardDiff.hessian suddenly jumped from 5ns to 3us, but my actual usecase is the ForwardDiff.hessian!.

Possibly relevant: Allocation on ForwardDiff + DiffResults + StaticArrays

DNF · May 10, 2024, 8:16am

Update: Now I am running two different REPLs with different environments, and I get this:
REPL1:

julia> @btime ForwardDiff.hessian($g, $x)
  7.900 ns (0 allocations: 0 bytes)
2×2 SMatrix{2, 2, Float32, 4} with indices SOneTo(2)×SOneTo(2):
 10.58   5.4
  5.4   -5.5

REPL2:

julia> @btime ForwardDiff.hessian($g, $x)
  2.167 μs (9 allocations: 208 bytes)
2×2 SMatrix{2, 2, Float32, 4} with indices SOneTo(2)×SOneTo(2):
 10.58   5.4
  5.4   -5.5

REPL1:

(jl_L5sXDq) pkg> st
Status `C:\Users\DNF\AppData\Local\Temp\jl_L5sXDq\Project.toml`
  [f68482b8] Cthulhu v2.12.5
  [163ba53b] DiffResults v1.1.0
  [f6369f11] ForwardDiff v0.10.36
  [90137ffa] StaticArrays v1.9.3

REPL2:

(MyProject) pkg> st
Project MyProject v1.0.0-DEV
Status `C:\Users\DNF\.julia\dev\myproject.jl\Project.toml`
  [26cce99e] BasicInterpolators v0.7.1
  [13f3f980] CairoMakie v0.12.0
  [ae650224] ChunkSplitters v2.4.2
  [f68482b8] Cthulhu v2.12.5
  [717857b8] DSP v0.7.9
  [163ba53b] DiffResults v1.1.0   # <= same as REPL1
  [31c24e10] Distributions v0.25.108
  [7a1cc6ca] FFTW v1.8.0
  [442a2c76] FastGaussQuadrature v1.0.2
  [1a297f60] FillArrays v1.11.0
  [f6369f11] ForwardDiff v0.10.36   # <= same as REPL1
  [e9467ef8] GLMakie v0.10.0
  [98e50ef6] JuliaFormatter v1.0.56
  [bdcacae8] LoopVectorization v0.12.170
  [23992714] MAT v0.10.6
  [ee78f7c6] Makie v0.21.0
  [429524aa] Optim v1.9.4
  [9b87118b] PackageCompiler v2.1.17
  [85a6dd25] PositiveFactorizations v0.2.4
  [92933f4c] ProgressMeter v1.10.0
  [295af30f] Revise v3.5.14
  [fdea26ae] SIMD v3.5.0
  [90137ffa] StaticArrays v1.9.3    # <= same as REPL1
  [09ab397b] StructArrays v0.6.18
  [20346346] TriangularIndices v0.1.0
  [37e2e46d] LinearAlgebra

Right now I guess this is both a autodiff question and a Pkg question, unfortunately. But the main question is still about ForwardDiff.hessian! and its small amount of allocations.

DNF · May 13, 2024, 9:20am

I’ll just bump this once. Perhaps I should rather open an issue at ForwardDiff.jl.

gdalle · May 13, 2024, 9:44am

That is very surprising. Are they both clean REPLs with only ForwardDiff and StaticArrays loaded?

DNF · May 13, 2024, 9:59am

ForwardDiff, DiffResults, StaticArrays and BenchmarkTools. Both clean.

Today, however, both REPLs are slow, with a runtime of 2us for the hessian (no allocations), and 50ns for the hessian! (1 allocation, 80bytes).

Topic		Replies	Views
Allocation on ForwardDiff + DiffResults + StaticArrays General Usage forwarddiff	2	709	September 13, 2020
Large amount of memory allocation when using autodiff and IPNewton Performance	1	119	January 30, 2024
Getting ForwardDiff jacobian! to execute with zero allocations Performance performance , memory-allocation , forwarddiff	9	967	April 26, 2024
Am I using DiffResults.jl correctly? Performance diffresults , forwarddiff	7	1003	March 14, 2020
Fast f(x) ∂f∂x and ∂2f∂x2 Performance autodiff	5	591	July 30, 2021

ForwardDiff.hessian! with StaticArrays, unexpected allocations and performance

Related topics