Allocations using dualcache

dbs · August 3, 2022, 4:12pm

I have a rather complicated function that I’ve written in a non-allocating manner, and am now trying to use ForwardDiff.jacobian! to generate a non-allocating jacobian. The function has many substeps, so use of a pre-allocated cache is required; and I’ve used the dualcache function from PreallocationTools.jl to create the required caches. While this works (in that it produces the correct answer when tested against a hand-written allocating jacobian routine), it still creates a large number of allocations. I’ve already seen and read quite a few discussions on this topic. Some of the posted solutions have helped, but allocations still remain.

The problem seems to be that while no allocations are being created when I call get_tmp(cache, vec) with vec a float vector, allocations are created when get_tmp is called on a dual vector. Here’s a somewhat minimalist example of my basic code structure:

using ForwardDiff, PreallocationTools
using LinearAlgebra

struct Solver
    A::Matrix{Float64}
    cache::PreallocationTools.DiffCache{Vector{Float64}, Vector{Float64}}
end

const ChunkSize = 12;

function test(testfunc!, N)
    print("\nN = ", N, "\n")

    # ChunkSize = ForwardDiff.pickchunksize(N)
    A = rand(N, N);
    B = rand(N);
    solver = Solver(A, dualcache(B, ChunkSize));

    C = similar(B);
    J = similar(A);
    func!(y, x) = testfunc!(y, x, solver);

    print("...Testing function\n")
    @time func!(C, B);

    cfg = ForwardDiff.JacobianConfig(func!, C, B, ForwardDiff.Chunk{ChunkSize}());
    print("...Testing jacobian\n")
    @time ForwardDiff.jacobian!(J, func!, C, B, cfg);
end

function example1!(y::Vector{<:Real}, x::Vector{<:Real}, solver::Solver)
    mul!(y, solver.A, x)
    return y
end
function example2!(y::Vector{<:Real}, x::Vector{<:Real}, solver::Solver)
    tmp = get_tmp(solver.cache, x)
    mul!(tmp, solver.A, x)
    @. y = tmp*tmp
    return 0
end
function example3!(y::Vector{<:Real}, x::Vector{<:Real}, solver::Solver)
    a = @allocated tmp = get_tmp(solver.cache, x); println(a)
    mul!(tmp, solver.A, x)
    @. y = tmp*tmp
    return 0
end

function testmany(funcs)
    for func in funcs
        print("\nTesting ", typeof(func), "\n")
        test(func, 100)
        test(func, 200)
    end
end

testmany([example1!, example2!, example3!])
testmany([example1!, example2!, example3!])

example1! is trivial, but runs correctly, in that neither the function nor the jacobian allocate:

Testing typeof(example1!)

N = 100
...Testing function
  0.000005 seconds
...Testing jacobian
  0.000211 seconds

N = 200
...Testing function
  0.000009 seconds
...Testing jacobian
  0.001240 seconds

example2! is closer to my case, and we now see allocations appear in the jacobian only:

Testing typeof(example2!)

N = 100
...Testing function
  0.000004 seconds
...Testing jacobian
  0.000221 seconds (9 allocations: 92.812 KiB)

N = 200
...Testing function
  0.000010 seconds
...Testing jacobian
  0.001268 seconds (34 allocations: 346.109 KiB)

The allocations clearly scale with the problem size. example3! just throws a block around the get_tmp function to track allocations:

Testing typeof(example3!)

N = 100
...Testing function
  0.000005 seconds
...Testing jacobian
10560
10560
10560
10560
10560
10560
10560
10560
10560
  0.000985 seconds (82 allocations: 95.219 KiB)

N = 200
...Testing function
  0.000008 seconds
...Testing jacobian
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
20848
  0.003006 seconds (172 allocations: 350.688 KiB)

And we see that the call to get_tmp is producing allocations, but only when called on a dual vector. Just to double check what’s going on in the simplest possible setting, I also tried:

let
A = zeros(10);
DA = zeros(ForwardDiff.Dual{ForwardDiff.Tag{nothing, Float64}, Float64, ChunkSize}, 10);
cache = dualcache(A, ChunkSize);
a = @allocated get_tmp(cache, A);  println(a)
a = @allocated get_tmp(cache, A);  println(a)
a = @allocated get_tmp(cache, DA); println(a)
a = @allocated get_tmp(cache, DA); println(a)
end

which returns:

Perhaps how I’m constructing the dualcache is wrong?

As a sidenote, if I uncomment the line:

# ChunkSize = ForwardDiff.pickchunksize(N)

from the definition of the function test, then allocations appear even for example1!, though they don’t scale with the problem size:

Testing typeof(example1!)

N = 100
...Testing function
  0.000006 seconds
...Testing jacobian
  0.000175 seconds (2 allocations: 1.250 KiB)

N = 200
...Testing function
  0.000009 seconds
...Testing jacobian
  0.001134 seconds (2 allocations: 1.250 KiB)

@code_warntype test(example1!, 100) now returns a type instability, while the original definition didn’t.

I’m running Julia v1.7.3 on a Mac, with ForwardDiff v0.10.30

Thanks!

dbs · August 3, 2022, 6:07pm

Tracing a little further back in the source, it appears that some combination of the view / restructure in:

function get_tmp(dc::DiffCache, u::AbstractArray{T}) where {T <: ForwardDiff.Dual}
    nelem = div(sizeof(T), sizeof(eltype(dc.dual_du))) * length(dc.du)
    if nelem > length(dc.dual_du)
        enlargedualcache!(dc, nelem)
    end
    ArrayInterfaceCore.restructure(dc.du, reinterpret(T, view(dc.dual_du, 1:nelem)))
end

(lines 52-58 of PreallocationTools.jl/PreallocationTools.jl at master · SciML/PreallocationTools.jl · GitHub) is the culprit. Indeed, just defining my_get_tmp as below:

function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
    return reshape(reinterpret(T, dc.dual_du), size(dc.du))
end
function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: Float64}
    return dc.du
end

and replacing get_tmp yields jacobians with no allocations. Of course, this is a less flexible solution.

Topic		Replies	Views
Problem with ForwardDiff and pre-allocating vector Numerics	6	1410	December 14, 2017
How to cache dual numbers correctly /dual_cache reinterpretation error General Usage question , diffeq	1	835	November 10, 2019
Weird memory allocation General Usage	24	1990	May 20, 2017
ForwardDiff and GradientConfig memory usage Performance	10	1155	April 5, 2018
ForwardDiff caches usage General Usage forwarddiff	5	985	July 5, 2021

Allocations using dualcache

Related topics