I have a rather complicated function that I’ve written in a non-allocating manner, and am now trying to use ForwardDiff.jacobian! to generate a non-allocating jacobian. The function has many substeps, so use of a pre-allocated cache is required; and I’ve used the dualcache function from PreallocationTools.jl to create the required caches. While this works (in that it produces the correct answer when tested against a hand-written allocating jacobian routine), it still creates a large number of allocations. I’ve already seen and read quite a few discussions on this topic. Some of the posted solutions have helped, but allocations still remain.

The problem seems to be that while no allocations are being created when I call get_tmp(cache, vec) with vec a float vector, allocations are created when get_tmp is called on a dual vector. Here’s a somewhat minimalist example of my basic code structure:

```
using ForwardDiff, PreallocationTools
using LinearAlgebra
struct Solver
A::Matrix{Float64}
cache::PreallocationTools.DiffCache{Vector{Float64}, Vector{Float64}}
end
const ChunkSize = 12;
function test(testfunc!, N)
print("\nN = ", N, "\n")
# ChunkSize = ForwardDiff.pickchunksize(N)
A = rand(N, N);
B = rand(N);
solver = Solver(A, dualcache(B, ChunkSize));
C = similar(B);
J = similar(A);
func!(y, x) = testfunc!(y, x, solver);
print("...Testing function\n")
@time func!(C, B);
cfg = ForwardDiff.JacobianConfig(func!, C, B, ForwardDiff.Chunk{ChunkSize}());
print("...Testing jacobian\n")
@time ForwardDiff.jacobian!(J, func!, C, B, cfg);
end
function example1!(y::Vector{<:Real}, x::Vector{<:Real}, solver::Solver)
mul!(y, solver.A, x)
return y
end
function example2!(y::Vector{<:Real}, x::Vector{<:Real}, solver::Solver)
tmp = get_tmp(solver.cache, x)
mul!(tmp, solver.A, x)
@. y = tmp*tmp
return 0
end
function example3!(y::Vector{<:Real}, x::Vector{<:Real}, solver::Solver)
a = @allocated tmp = get_tmp(solver.cache, x); println(a)
mul!(tmp, solver.A, x)
@. y = tmp*tmp
return 0
end
function testmany(funcs)
for func in funcs
print("\nTesting ", typeof(func), "\n")
test(func, 100)
test(func, 200)
end
end
testmany([example1!, example2!, example3!])
testmany([example1!, example2!, example3!])
```

example1! is trivial, but runs correctly, in that neither the function nor the jacobian allocate:

```
Testing typeof(example1!)
N = 100
...Testing function
0.000005 seconds
...Testing jacobian
0.000211 seconds
N = 200
...Testing function
0.000009 seconds
...Testing jacobian
0.001240 seconds
```

example2! is closer to my case, and we now see allocations appear in the jacobian only:

```
Testing typeof(example2!)
N = 100
...Testing function
0.000004 seconds
...Testing jacobian
0.000221 seconds (9 allocations: 92.812 KiB)
N = 200
...Testing function
0.000010 seconds
...Testing jacobian
0.001268 seconds (34 allocations: 346.109 KiB)
```

The allocations clearly scale with the problem size. example3! just throws a block around the get_tmp function to track allocations:

```
Testing typeof(example3!)
N = 100
...Testing function
0.000005 seconds
...Testing jacobian
10560
10560
10560
10560
10560
10560
10560
10560
10560
0.000985 seconds (82 allocations: 95.219 KiB)
N = 200
...Testing function
0.000008 seconds
...Testing jacobian
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
0.003006 seconds (172 allocations: 350.688 KiB)
```

And we see that the call to get_tmp is producing allocations, but only when called on a dual vector. Just to double check what’s going on in the simplest possible setting, I also tried:

```
let
A = zeros(10);
DA = zeros(ForwardDiff.Dual{ForwardDiff.Tag{nothing, Float64}, Float64, ChunkSize}, 10);
cache = dualcache(A, ChunkSize);
a = @allocated get_tmp(cache, A); println(a)
a = @allocated get_tmp(cache, A); println(a)
a = @allocated get_tmp(cache, DA); println(a)
a = @allocated get_tmp(cache, DA); println(a)
end
```

which returns:

```
0
0
1168
1168
```

Perhaps how I’m constructing the dualcache is wrong?

As a sidenote, if I uncomment the line:

```
# ChunkSize = ForwardDiff.pickchunksize(N)
```

from the definition of the function test, then allocations appear even for example1!, though they don’t scale with the problem size:

```
Testing typeof(example1!)
N = 100
...Testing function
0.000006 seconds
...Testing jacobian
0.000175 seconds (2 allocations: 1.250 KiB)
N = 200
...Testing function
0.000009 seconds
...Testing jacobian
0.001134 seconds (2 allocations: 1.250 KiB)
```

@code_warntype test(example1!, 100) now returns a type instability, while the original definition didn’t.

I’m running Julia v1.7.3 on a Mac, with ForwardDiff v0.10.30

Thanks!