# Allocations using dualcache

I have a rather complicated function that I’ve written in a non-allocating manner, and am now trying to use ForwardDiff.jacobian! to generate a non-allocating jacobian. The function has many substeps, so use of a pre-allocated cache is required; and I’ve used the dualcache function from PreallocationTools.jl to create the required caches. While this works (in that it produces the correct answer when tested against a hand-written allocating jacobian routine), it still creates a large number of allocations. I’ve already seen and read quite a few discussions on this topic. Some of the posted solutions have helped, but allocations still remain.

The problem seems to be that while no allocations are being created when I call get_tmp(cache, vec) with vec a float vector, allocations are created when get_tmp is called on a dual vector. Here’s a somewhat minimalist example of my basic code structure:

``````using ForwardDiff, PreallocationTools
using LinearAlgebra

struct Solver
A::Matrix{Float64}
cache::PreallocationTools.DiffCache{Vector{Float64}, Vector{Float64}}
end

const ChunkSize = 12;

function test(testfunc!, N)
print("\nN = ", N, "\n")

# ChunkSize = ForwardDiff.pickchunksize(N)
A = rand(N, N);
B = rand(N);
solver = Solver(A, dualcache(B, ChunkSize));

C = similar(B);
J = similar(A);
func!(y, x) = testfunc!(y, x, solver);

print("...Testing function\n")
@time func!(C, B);

cfg = ForwardDiff.JacobianConfig(func!, C, B, ForwardDiff.Chunk{ChunkSize}());
print("...Testing jacobian\n")
@time ForwardDiff.jacobian!(J, func!, C, B, cfg);
end

function example1!(y::Vector{<:Real}, x::Vector{<:Real}, solver::Solver)
mul!(y, solver.A, x)
return y
end
function example2!(y::Vector{<:Real}, x::Vector{<:Real}, solver::Solver)
tmp = get_tmp(solver.cache, x)
mul!(tmp, solver.A, x)
@. y = tmp*tmp
return 0
end
function example3!(y::Vector{<:Real}, x::Vector{<:Real}, solver::Solver)
a = @allocated tmp = get_tmp(solver.cache, x); println(a)
mul!(tmp, solver.A, x)
@. y = tmp*tmp
return 0
end

function testmany(funcs)
for func in funcs
print("\nTesting ", typeof(func), "\n")
test(func, 100)
test(func, 200)
end
end

testmany([example1!, example2!, example3!])
testmany([example1!, example2!, example3!])
``````

example1! is trivial, but runs correctly, in that neither the function nor the jacobian allocate:

``````Testing typeof(example1!)

N = 100
...Testing function
0.000005 seconds
...Testing jacobian
0.000211 seconds

N = 200
...Testing function
0.000009 seconds
...Testing jacobian
0.001240 seconds
``````

example2! is closer to my case, and we now see allocations appear in the jacobian only:

``````Testing typeof(example2!)

N = 100
...Testing function
0.000004 seconds
...Testing jacobian
0.000221 seconds (9 allocations: 92.812 KiB)

N = 200
...Testing function
0.000010 seconds
...Testing jacobian
0.001268 seconds (34 allocations: 346.109 KiB)
``````

The allocations clearly scale with the problem size. example3! just throws a block around the get_tmp function to track allocations:

``````Testing typeof(example3!)

N = 100
...Testing function
0.000005 seconds
...Testing jacobian
10560
10560
10560
10560
10560
10560
10560
10560
10560
0.000985 seconds (82 allocations: 95.219 KiB)

N = 200
...Testing function
0.000008 seconds
...Testing jacobian
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
0.003006 seconds (172 allocations: 350.688 KiB)
``````

And we see that the call to get_tmp is producing allocations, but only when called on a dual vector. Just to double check what’s going on in the simplest possible setting, I also tried:

``````let
A = zeros(10);
DA = zeros(ForwardDiff.Dual{ForwardDiff.Tag{nothing, Float64}, Float64, ChunkSize}, 10);
cache = dualcache(A, ChunkSize);
a = @allocated get_tmp(cache, A);  println(a)
a = @allocated get_tmp(cache, A);  println(a)
a = @allocated get_tmp(cache, DA); println(a)
a = @allocated get_tmp(cache, DA); println(a)
end
``````

which returns:

``````0
0
1168
1168
``````

Perhaps how I’m constructing the dualcache is wrong?

As a sidenote, if I uncomment the line:

``````# ChunkSize = ForwardDiff.pickchunksize(N)
``````

from the definition of the function test, then allocations appear even for example1!, though they don’t scale with the problem size:

``````Testing typeof(example1!)

N = 100
...Testing function
0.000006 seconds
...Testing jacobian
0.000175 seconds (2 allocations: 1.250 KiB)

N = 200
...Testing function
0.000009 seconds
...Testing jacobian
0.001134 seconds (2 allocations: 1.250 KiB)
``````

@code_warntype test(example1!, 100) now returns a type instability, while the original definition didn’t.

I’m running Julia v1.7.3 on a Mac, with ForwardDiff v0.10.30

Thanks!

Tracing a little further back in the source, it appears that some combination of the view / restructure in:

``````function get_tmp(dc::DiffCache, u::AbstractArray{T}) where {T <: ForwardDiff.Dual}
nelem = div(sizeof(T), sizeof(eltype(dc.dual_du))) * length(dc.du)
if nelem > length(dc.dual_du)
enlargedualcache!(dc, nelem)
end
ArrayInterfaceCore.restructure(dc.du, reinterpret(T, view(dc.dual_du, 1:nelem)))
end
``````

(lines 52-58 of PreallocationTools.jl/PreallocationTools.jl at master · SciML/PreallocationTools.jl · GitHub) is the culprit. Indeed, just defining my_get_tmp as below:

``````function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
return reshape(reinterpret(T, dc.dual_du), size(dc.du))
end
function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: Float64}
return dc.du
end
``````

and replacing get_tmp yields jacobians with no allocations. Of course, this is a less flexible solution.