# Cannot use buffer with AD?

Can I not use a buffer with AD as below?

``````using ForwardDiff

function buf(x)
similar(x)
end

function test(x; sto=buf(x))
sto .= x.^2
sum(sto.^2)
end

end

x = rand(2, 2)
``````

This throws an error:

``````julia> grad(x; sto=buf(x))
ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#316#317"{Matrix{Float64}}, Float64}, Float64, 4})
Closest candidates are:
(::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:200
(::Type{T})(::T) where T<:Number at boot.jl:760
(::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
...
``````

This works:

``````julia> ForwardDiff.gradient(x -> test(x; sto=buf(x)), x)
2×2 Matrix{Float64}:
2.66201   0.00867353
0.205537  2.72375
``````
2 Likes

I’m having trouble about where to put the `get_tmp` and where to define the `DiffCache`:

``````using ComponentArrays
using UnPack
using PreallocationTools
using ForwardDiff
using FiniteDiff

function ces(x, shares, r)
n = length(x)
(n == length(shares)) || throw(DimensionMismatch())
z = zero(eltype(x))
for i in 1:n
z += shares[i] * x[i]^r
end
return z^inv(r)
end

function temp_L(L1)
gek = similar(L1, 2, 2, 5)
ek = similar(gek, 2, 5)

return ComponentArray{eltype(L1)}(;
gek,
ek
)
end

function f(A1, L1; θ=θ, tempL=temp_L(L1))
@unpack gek, ek = tempL
@unpack ν, ϖ, ξ, ρ = θ

for e in 1:2, k in 1:5
gek[1, e, k] = ces(L1[1, :, e, k], ν, ρ)
gek[2, e, k] = ces(L1[2, :, e, k], ϖ, ρ)
end

for e in 1:2, k in 1:5
ek[e, k] = ces(gek[:, e, k], (ξ, 1 - ξ), ρ)
end

return A1 * sum(ek)
end

L1 = rand(2, 5, 2, 5)
A1 = rand()

θ = ComponentArray{Float64}(ν=rand(5), ϖ=rand(5), ξ=rand(), ρ=rand())
``````

The gradient I am trying to obtain is

``````function ∇ₗf(A1, L1; θ=θ, tempL=temp_L(L1), autodiff=:forward)
if autodiff == :forward
return ForwardDiff.gradient(x -> f(A1, x; θ=θ, tempL=tempL), L1)
elseif autodiff == :finite
return FiniteDiff.finite_difference_gradient(x -> f(A1, x; θ=θ, tempL=tempL), L1)
end
end

∇ₗf(A1, L1)
``````

If I define the functions:

``````function temp_L(L1)
gek = similar(L1, 2, 2, 5)
ek = similar(gek, 2, 5)

return dualcache(ComponentArray{eltype(L1)}(;
gek,
ek
))
end

function f(A1, L1; θ=θ, tempL=temp_L(L1))
tempL = get_tmp(tempL, L1)
@unpack gek, ek = tempL
@unpack ν, ϖ, ξ, ρ = θ
...
end
``````

it errors with

``````julia> ∇ₗf(A1, L1; autodiff=:forward)
ERROR: BoundsError: attempt to access 330-element Vector{Float64} at index [1:390]
``````

Try and make an MWE for ComponentArray dualcaches that can then be solved.

Unfortunately, this is one of the instances, where the default chunk size picked by dualcache doesn’t work. Generating the dualcache based on gek and ek gives chunk size = 10; but when you call the gradient with L1, then a chunk size of 12 is used. Therefore, there isn’t enough space in the cache vector to run the code and you get the error you are posting.

The following code fixes it:

``````using ComponentArrays
using UnPack
using PreallocationTools
using ForwardDiff
using FiniteDiff

function ces(x, shares, r)
n = length(x)
(n == length(shares)) || throw(DimensionMismatch())
z = zero(eltype(x))
for i in 1:n
z += shares[i] * x[i]^r
end
return z^inv(r)
end

function temp_L(L1)
gek = similar(L1, 2, 2, 5)
ek = similar(gek, 2, 5)

tmp = ComponentArray{eltype(L1)}(;
gek,
ek
)
return dualcache(tmp, 12)
end

function f(A1, L1; θ=θ, tempL=temp_L(L1))
@unpack gek, ek = get_tmp(tempL, L1)
@unpack ν, ϖ, ξ, ρ = θ

for e in 1:2, k in 1:5
gek[1, e, k] = ces(L1[1, :, e, k], ν, ρ)
gek[2, e, k] = ces(L1[2, :, e, k], ϖ, ρ)
end

for e in 1:2, k in 1:5
ek[e, k] = ces(gek[:, e, k], (ξ, 1 - ξ), ρ)
end

return A1 * sum(ek)
end

L1 = rand(2, 5, 2, 5)
A1 = rand()

θ = ComponentArray{Float64}(ν=rand(5), ϖ=rand(5), ξ=rand(), ρ=rand())

function ∇ₗf(A1, L1; θ=θ, tempL=temp_L(L1), autodiff=:forward)
if autodiff == :forward
return ForwardDiff.gradient(x -> f(A1, x; θ=θ, tempL=tempL), L1)
elseif autodiff == :finite
return FiniteDiff.finite_difference_gradient(x -> f(A1, x; θ=θ, tempL=tempL), L1)
end
end

grad_FD = ∇ₗf(A1, L1, autodiff = :finite)
``````

As you can see, the result is the same by FD and AD now (within numerical precision).

1 Like

That 12 in `dualcache(tmp, 12)` comes from where?

PreallocationTools calls a heuristic of ForwardDiff to decide the default chunk size. You can find the heuristic here (function pickchunksize):
In the case of supplying a (2 x 5 x 2 x 5) array, i.e., your L1, this heuristic falls back to the threshold (`ForwardDiff.DEFAULT_CHUNK_THRESHOLD = 12` in the same file). In the case of your temp_L function, the tmp array has a lower number of elements (100 in this case) and the chunk size picked when generating the DiffCache using dualcache() is 10.

In general, you need to generate a dualcache of at least the same chunk size that you require later in the calculations; which is not the case here (10 < 12).

In your case, you probably deal with L1’s of varying size at some point; some big, some small, so I would be tempted to just set the chunk size to 12 (which is the upper threshold ForwardDiff uses; I did that with the line `dualcache(tmp, 12)`). That means that you carry around a cache that is too big for your smaller L1’s, but it should always work out.

On the other hand, if you are memory constrained in your application, you should change your temp_L function to use a chunk size that is in agreement with the L1 that you are supplying to your gradient calculation later on, rather than relying on ForwardDiff’s heuristic.

Obviously, that requires more insight than just setting it to twelve…

1 Like

I see, so as long as I don’t change the default chunk size for `ForwardDiff`, 12 should be fine. I presume `ForwardDiff` never goes above 12 unless specified by the user, is that so?

That’s correct. ForwardDiff will never go beyond 12 unless the threshold is changed intentionally (but in this case, you can also just use the variable directly, i.e., replace 12 by `ForwardDiff.DEFAULT_CHUNK_THRESHOLD`)

Now, why the threshold is set at 12 by default, I don’t precisely know. I would guess that it has to do something with cache sizes, memory bandwidth and that a performance analysis over handful examples has led to the consensus that going beyond 12 is not performant.

1 Like

It doesn’t get faster beyond 12, in fact like StaticArrays it can get slower in many cases. At 12, <10% of the time is in the primal calculations, so it’s more about just the amount of SIMD that can be done vs the huge compile time and unrolling.

1 Like

I suppose that 12 is Julia-specific?

Kind of.