Cannot use buffer with AD?

amrods · November 11, 2021, 6:04am

Can I not use a buffer with AD as below?

using ForwardDiff

function buf(x)
    similar(x)
end

function test(x; sto=buf(x))
    sto .= x.^2
    sum(sto.^2)
end

function grad(x; sto=buf(x))
    ForwardDiff.gradient(x -> test(x; sto=sto), x)
end

x = rand(2, 2)

This throws an error:

julia> grad(x; sto=buf(x))
ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#316#317"{Matrix{Float64}}, Float64}, Float64, 4})
Closest candidates are:
  (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  (::Type{T})(::T) where T<:Number at boot.jl:760
  (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
  ...

This works:

julia> ForwardDiff.gradient(x -> test(x; sto=buf(x)), x)
2×2 Matrix{Float64}:
 2.66201   0.00867353
 0.205537  2.72375

ChrisRackauckas · November 11, 2021, 10:29am

amrods · November 15, 2021, 4:43am

I’m having trouble about where to put the get_tmp and where to define the DiffCache:

using ComponentArrays
using UnPack
using PreallocationTools
using ForwardDiff
using FiniteDiff

function ces(x, shares, r)
    n = length(x)
    (n == length(shares)) || throw(DimensionMismatch())
    z = zero(eltype(x))
    for i in 1:n
        z += shares[i] * x[i]^r
    end
    return z^inv(r)
end

function temp_L(L1)
    gek = similar(L1, 2, 2, 5)
    ek = similar(gek, 2, 5)
    
    return ComponentArray{eltype(L1)}(;
        gek,
        ek
        )
end

function f(A1, L1; θ=θ, tempL=temp_L(L1))
    @unpack gek, ek = tempL
    @unpack ν, ϖ, ξ, ρ = θ

    for e in 1:2, k in 1:5
        gek[1, e, k] = ces(L1[1, :, e, k], ν, ρ)
        gek[2, e, k] = ces(L1[2, :, e, k], ϖ, ρ)
    end

    for e in 1:2, k in 1:5
        ek[e, k] = ces(gek[:, e, k], (ξ, 1 - ξ), ρ)
    end

    return A1 * sum(ek)
end

L1 = rand(2, 5, 2, 5)
A1 = rand()

θ = ComponentArray{Float64}(ν=rand(5), ϖ=rand(5), ξ=rand(), ρ=rand())

The gradient I am trying to obtain is

function ∇ₗf(A1, L1; θ=θ, tempL=temp_L(L1), autodiff=:forward)
    if autodiff == :forward
        return ForwardDiff.gradient(x -> f(A1, x; θ=θ, tempL=tempL), L1)
    elseif autodiff == :finite
        return FiniteDiff.finite_difference_gradient(x -> f(A1, x; θ=θ, tempL=tempL), L1)
    end
end

∇ₗf(A1, L1)

If I define the functions:

function temp_L(L1)
    gek = similar(L1, 2, 2, 5)
    ek = similar(gek, 2, 5)

    return dualcache(ComponentArray{eltype(L1)}(;
        gek,
        ek
        ))
end

function f(A1, L1; θ=θ, tempL=temp_L(L1))
    tempL = get_tmp(tempL, L1)
    @unpack gek, ek = tempL
    @unpack ν, ϖ, ξ, ρ = θ
    ...
end

it errors with

julia> ∇ₗf(A1, L1; autodiff=:forward)
ERROR: BoundsError: attempt to access 330-element Vector{Float64} at index [1:390]

ChrisRackauckas · November 15, 2021, 11:08am

Try and make an MWE for ComponentArray dualcaches that can then be solved.

vettert · November 15, 2021, 11:13am

Unfortunately, this is one of the instances, where the default chunk size picked by dualcache doesn’t work. Generating the dualcache based on gek and ek gives chunk size = 10; but when you call the gradient with L1, then a chunk size of 12 is used. Therefore, there isn’t enough space in the cache vector to run the code and you get the error you are posting.

The following code fixes it:

using ComponentArrays
using UnPack
using PreallocationTools
using ForwardDiff
using FiniteDiff

function ces(x, shares, r)
    n = length(x)
    (n == length(shares)) || throw(DimensionMismatch())
    z = zero(eltype(x))
    for i in 1:n
        z += shares[i] * x[i]^r
    end
    return z^inv(r)
end

function temp_L(L1)
    gek = similar(L1, 2, 2, 5)
    ek = similar(gek, 2, 5)
    
    tmp = ComponentArray{eltype(L1)}(;
        gek,
        ek
        )
    return dualcache(tmp, 12)
end

function f(A1, L1; θ=θ, tempL=temp_L(L1))
    @unpack gek, ek = get_tmp(tempL, L1)
    @unpack ν, ϖ, ξ, ρ = θ

    for e in 1:2, k in 1:5
        gek[1, e, k] = ces(L1[1, :, e, k], ν, ρ)
        gek[2, e, k] = ces(L1[2, :, e, k], ϖ, ρ)
    end

    for e in 1:2, k in 1:5
        ek[e, k] = ces(gek[:, e, k], (ξ, 1 - ξ), ρ)
    end

    return A1 * sum(ek)
end

L1 = rand(2, 5, 2, 5)
A1 = rand()

θ = ComponentArray{Float64}(ν=rand(5), ϖ=rand(5), ξ=rand(), ρ=rand())

function ∇ₗf(A1, L1; θ=θ, tempL=temp_L(L1), autodiff=:forward)
    if autodiff == :forward
        return ForwardDiff.gradient(x -> f(A1, x; θ=θ, tempL=tempL), L1)
    elseif autodiff == :finite
        return FiniteDiff.finite_difference_gradient(x -> f(A1, x; θ=θ, tempL=tempL), L1)
    end
end

grad_AD = ∇ₗf(A1, L1)
grad_FD = ∇ₗf(A1, L1, autodiff = :finite)
difference = grad_AD .- grad_FD

As you can see, the result is the same by FD and AD now (within numerical precision).

amrods · November 15, 2021, 11:15am

That 12 in dualcache(tmp, 12) comes from where?

vettert · November 15, 2021, 11:25am

PreallocationTools calls a heuristic of ForwardDiff to decide the default chunk size. You can find the heuristic here (function pickchunksize):
ForwardDiff.jl/prelude.jl at 241829d80fa50810e859e884184da00eadeb14d8 · JuliaDiff/ForwardDiff.jl (github.com)
In the case of supplying a (2 x 5 x 2 x 5) array, i.e., your L1, this heuristic falls back to the threshold (ForwardDiff.DEFAULT_CHUNK_THRESHOLD = 12 in the same file). In the case of your temp_L function, the tmp array has a lower number of elements (100 in this case) and the chunk size picked when generating the DiffCache using dualcache() is 10.

In general, you need to generate a dualcache of at least the same chunk size that you require later in the calculations; which is not the case here (10 < 12).

In your case, you probably deal with L1’s of varying size at some point; some big, some small, so I would be tempted to just set the chunk size to 12 (which is the upper threshold ForwardDiff uses; I did that with the line dualcache(tmp, 12)). That means that you carry around a cache that is too big for your smaller L1’s, but it should always work out.

On the other hand, if you are memory constrained in your application, you should change your temp_L function to use a chunk size that is in agreement with the L1 that you are supplying to your gradient calculation later on, rather than relying on ForwardDiff’s heuristic.

Obviously, that requires more insight than just setting it to twelve…

amrods · November 15, 2021, 11:31am

I see, so as long as I don’t change the default chunk size for ForwardDiff, 12 should be fine. I presume ForwardDiff never goes above 12 unless specified by the user, is that so?

vettert · November 15, 2021, 11:54am

That’s correct. ForwardDiff will never go beyond 12 unless the threshold is changed intentionally (but in this case, you can also just use the variable directly, i.e., replace 12 by ForwardDiff.DEFAULT_CHUNK_THRESHOLD)

Now, why the threshold is set at 12 by default, I don’t precisely know. I would guess that it has to do something with cache sizes, memory bandwidth and that a performance analysis over handful examples has led to the consensus that going beyond 12 is not performant.

ChrisRackauckas · November 15, 2021, 1:57pm

It doesn’t get faster beyond 12, in fact like StaticArrays it can get slower in many cases. At 12, <10% of the time is in the primal calculations, so it’s more about just the amount of SIMD that can be done vs the huge compile time and unrolling.

amrods · November 15, 2021, 9:19pm

I suppose that 12 is Julia-specific?

ChrisRackauckas · November 15, 2021, 10:14pm

Kind of.

Topic		Replies	Views
How to cache dual numbers correctly /dual_cache reinterpretation error General Usage question , diffeq	1	826	November 10, 2019
ForwardDiff question Machine Learning	6	1876	July 26, 2019
How to make this function compatible with ForwardDiff? General Usage forwarddiff , autodiff	15	1435	September 22, 2021
Issue with PDMP and Forwardiff - DifferentialEquation Performance diffeq	25	2106	November 8, 2021
How to: High-performance differentiable programming with broad AD-library support? Performance zygote , forwarddiff , reversediff , ad	13	604	May 6, 2024

Cannot use buffer with AD?

Related topics