I have a rather complicated function that I’ve written in a non-allocating manner, and am now trying to use ForwardDiff.jacobian! to generate a non-allocating jacobian. The function has many substeps, so use of a pre-allocated cache is required; and I’ve used the dualcache function from PreallocationTools.jl to create the required caches. While this works (in that it produces the correct answer when tested against a hand-written allocating jacobian routine), it still creates a large number of allocations. I’ve already seen and read quite a few discussions on this topic. Some of the posted solutions have helped, but allocations still remain.
The problem seems to be that while no allocations are being created when I call get_tmp(cache, vec) with vec a float vector, allocations are created when get_tmp is called on a dual vector. Here’s a somewhat minimalist example of my basic code structure:
using ForwardDiff, PreallocationTools
using LinearAlgebra
struct Solver
    A::Matrix{Float64}
    cache::PreallocationTools.DiffCache{Vector{Float64}, Vector{Float64}}
end
const ChunkSize = 12;
function test(testfunc!, N)
    print("\nN = ", N, "\n")
    # ChunkSize = ForwardDiff.pickchunksize(N)
    A = rand(N, N);
    B = rand(N);
    solver = Solver(A, dualcache(B, ChunkSize));
    C = similar(B);
    J = similar(A);
    func!(y, x) = testfunc!(y, x, solver);
    print("...Testing function\n")
    @time func!(C, B);
    cfg = ForwardDiff.JacobianConfig(func!, C, B, ForwardDiff.Chunk{ChunkSize}());
    print("...Testing jacobian\n")
    @time ForwardDiff.jacobian!(J, func!, C, B, cfg);
end
function example1!(y::Vector{<:Real}, x::Vector{<:Real}, solver::Solver)
    mul!(y, solver.A, x)
    return y
end
function example2!(y::Vector{<:Real}, x::Vector{<:Real}, solver::Solver)
    tmp = get_tmp(solver.cache, x)
    mul!(tmp, solver.A, x)
    @. y = tmp*tmp
    return 0
end
function example3!(y::Vector{<:Real}, x::Vector{<:Real}, solver::Solver)
    a = @allocated tmp = get_tmp(solver.cache, x); println(a)
    mul!(tmp, solver.A, x)
    @. y = tmp*tmp
    return 0
end
function testmany(funcs)
    for func in funcs
        print("\nTesting ", typeof(func), "\n")
        test(func, 100)
        test(func, 200)
    end
end
testmany([example1!, example2!, example3!])
testmany([example1!, example2!, example3!])
example1! is trivial, but runs correctly, in that neither the function nor the jacobian allocate:
Testing typeof(example1!)
N = 100
...Testing function
  0.000005 seconds
...Testing jacobian
  0.000211 seconds
N = 200
...Testing function
  0.000009 seconds
...Testing jacobian
  0.001240 seconds
example2! is closer to my case, and we now see allocations appear in the jacobian only:
Testing typeof(example2!)
N = 100
...Testing function
  0.000004 seconds
...Testing jacobian
  0.000221 seconds (9 allocations: 92.812 KiB)
N = 200
...Testing function
  0.000010 seconds
...Testing jacobian
  0.001268 seconds (34 allocations: 346.109 KiB)
The allocations clearly scale with the problem size. example3! just throws a block around the get_tmp function to track allocations:
Testing typeof(example3!)
N = 100
...Testing function
  0.000005 seconds
...Testing jacobian
10560
10560
10560
10560
10560
10560
10560
10560
10560
  0.000985 seconds (82 allocations: 95.219 KiB)
N = 200
...Testing function
  0.000008 seconds
...Testing jacobian
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
  0.003006 seconds (172 allocations: 350.688 KiB)
And we see that the call to get_tmp is producing allocations, but only when called on a dual vector. Just to double check what’s going on in the simplest possible setting, I also tried:
let
A = zeros(10);
DA = zeros(ForwardDiff.Dual{ForwardDiff.Tag{nothing, Float64}, Float64, ChunkSize}, 10);
cache = dualcache(A, ChunkSize);
a = @allocated get_tmp(cache, A);  println(a)
a = @allocated get_tmp(cache, A);  println(a)
a = @allocated get_tmp(cache, DA); println(a)
a = @allocated get_tmp(cache, DA); println(a)
end
which returns:
0
0
1168
1168
Perhaps how I’m constructing the dualcache is wrong?
As a sidenote, if I uncomment the line:
# ChunkSize = ForwardDiff.pickchunksize(N)
from the definition of the function test, then allocations appear even for example1!, though they don’t scale with the problem size:
Testing typeof(example1!)
N = 100
...Testing function
  0.000006 seconds
...Testing jacobian
  0.000175 seconds (2 allocations: 1.250 KiB)
N = 200
...Testing function
  0.000009 seconds
...Testing jacobian
  0.001134 seconds (2 allocations: 1.250 KiB)
@code_warntype test(example1!, 100) now returns a type instability, while the original definition didn’t.
I’m running Julia v1.7.3 on a Mac, with ForwardDiff v0.10.30
Thanks!