How to make this function compatible with ForwardDiff?

How can I make this function compatible with ForwardDiff with respect to τ?

using ForwardDiff

randmat = rand(10, 2)
sto = similar(randmat)

function claytonsample!(sto, τ; randmat=randmat)
    sto .= randmat
    τ == 0 && return sto

    n = size(sto, 1)
    for i in 1:n
        v = sto[i, 2]
        u = sto[i, 1]
        sto[i, 2] = (1 - u^(-τ) + u^(-τ)*v^(-(τ/(1 + τ))))^(-1/τ)
    end
    return sto
end

right now I get this error:

julia> ForwardDiff.derivative(τ -> claytonsample!(sto, τ), 0.3)
ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#81#82", Float64}, Float64, 1})
Closest candidates are:
  (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  (::Type{T})(::T) where T<:Number at boot.jl:760
  (::Type{T})(::VectorizationBase.Double{T}) where T<:Union{Float16, Float32, Float64, VectorizationBase.Vec{var"#s32", var"#s31"} where {var"#s32", var"#s31"<:Union{Float16, Float32, Float64}}, VectorizationBase.VecUnroll{var"#s30", var"#s29", var"#s28", V} where {var"#s30", var"#s29", var"#s28"<:Union{Float16, Float32, Float64}, V<:Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, SIMDTypes.Bit, VectorizationBase.AbstractSIMD{var"#s29", var"#s28"}}}} at /Users/amrods/.julia/packages/VectorizationBase/xmoDw/src/special/double.jl:84
  ...

Strictly speaking you can’t: this is a function with side effects so it’s not completely clear what it means to differentiate it. Here the problem is that sto has to be of eltype Dual, not Float64. You can eg do claytonsample(tau; randmat) = claytonsample!(similar(randmat, typeof(tau)), tau; randmat) [untested] and differentiating this function should work (note the similar command constructs an array just like randmat, but of eltype typeof(tau))

1 Like

I see. In this case, is there then a tradeoff between the speed of AD vs the speed gained by avoiding allocations (which is what I tried to do by writing claytonsample! that way)?

You don’t have to choose. You can preallocate in functions you AD. See

which has some helper functions and examples.

3 Likes

If I understand correctly then this should work:

using ForwarDiff
using PreallocationTools

randmat = rand(10, 2)
sto = similar(randmat)

function claytonsample!(sto, τ; randmat=randmat)
    sto = get_tmp(sto, τ)
    sto .= randmat
    τ == 0 && return sto

    n = size(sto, 1)
    for i in 1:n
        v = sto[i, 2]
        u = sto[i, 1]
        sto[i, 2] = (1 - u^(-τ) + u^(-τ)*v^(-(τ/(1 + τ))))^(-1/τ)
    end
    return sto
end

yet

julia> ForwardDiff.derivative(τ -> claytonsample!(sto, τ), 0.3)
ERROR: MethodError: no method matching get_tmp(::Matrix{Float64}, ::ForwardDiff.Dual{ForwardDiff.Tag{var"#138#139", Float64}, Float64, 1})
Closest candidates are:
  get_tmp(::PreallocationTools.DiffCache, ::T) where T<:ForwardDiff.Dual at /Users/amrods/.julia/packages/PreallocationTools/l1Nsk/src/PreallocationTools.jl:26
  get_tmp(::PreallocationTools.DiffCache, ::Number) at /Users/amrods/.julia/packages/PreallocationTools/l1Nsk/src/PreallocationTools.jl:39

You never made a DiffCache.

1 Like

is this error because you have not defined broadcasting for DiffCache?

using ForwardDiff
using PreallocationTools

randmat = rand(10, 2)
sto = similar(randmat)
stod = dualcache(sto)

function claytonsample!(sto, τ; randmat=randmat)
    sto = get_tmp(sto, τ)
    sto .= randmat
    τ == 0 && return sto

    n = size(sto, 1)
    for i in 1:n
        v = sto[i, 2]
        u = sto[i, 1]
        sto[i, 2] = (1 - u^(-τ) + u^(-τ)*v^(-(τ/(1 + τ))))^(-1/τ)
    end
    return sto
end

ForwardDiff.derivative(τ -> claytonsample!(stod, τ), 0.3)

ERROR: LoadError: DimensionMismatch("array could not be broadcast to match destination")
Stacktrace:
  [1] check_broadcast_shape
    @ ./broadcast.jl:520 [inlined]
  [2] check_broadcast_axes
    @ ./broadcast.jl:523 [inlined]
  [3] instantiate
    @ ./broadcast.jl:269 [inlined]
  [4] materialize!
    @ ./broadcast.jl:894 [inlined]
  [5] materialize!
    @ ./broadcast.jl:891 [inlined]
  [6] claytonsample!(sto::PreallocationTools.DiffCache{Matrix{Float64}, Matrix{ForwardDiff.Dual{nothing, Float64, 12}}}, τ::ForwardDiff.Dual{ForwardDiff.Tag{var"#74#75", Float64}, Float64, 1}; randmat::Matrix{Float64})
    @ Main ~/test.jl:1307
  [7] claytonsample!(sto::PreallocationTools.DiffCache{Matrix{Float64}, Matrix{ForwardDiff.Dual{nothing, Float64, 12}}}, τ::ForwardDiff.Dual{ForwardDiff.Tag{var"#74#75", Float64}, Float64, 1})
    @ Main ~/test.jl:1306
  [8] (::var"#74#75")(τ::ForwardDiff.Dual{ForwardDiff.Tag{var"#74#75", Float64}, Float64, 1})
    @ Main ~/test.jl:1319
  [9] derivative(f::var"#74#75", x::Float64)
    @ ForwardDiff ~/.julia/packages/ForwardDiff/5gUap/src/derivative.jl:14
 [10] top-level scope
    @ ~/test.jl:1319
in expression starting at /Users/amrods/test.jl:1319

explicit indexing works, but yields NaN:

using ForwardDiff
using PreallocationTools

randmat = rand(10, 2)
sto = similar(randmat)
stod = dualcache(sto)

function claytonsample!(sto, τ; randmat=randmat)
    sto = get_tmp(sto, τ)
    for i in eachindex(randmat)
        sto[i] = randmat[i]
    end
    τ == 0 && return sto

    n = size(sto, 1)
    for i in 1:n
        v = sto[i, 2]
        u = sto[i, 1]
        sto[i, 2] = (1 - u^(-τ) + u^(-τ)*v^(-(τ/(1 + τ))))^(-1/τ)
    end
    return sto
end

julia> ForwardDiff.derivative(τ -> claytonsample!(stod, τ), 0.3)
55×2 Matrix{Float64}:
 0.0  NaN
 0.0  NaN
 0.0  NaN
 0.0  NaN
 0.0  NaN
 ⋮    
 0.0  NaN
 0.0  NaN
 0.0  NaN
 0.0  NaN
 0.0  NaN

Well, yeah, you didn’t match the chunk sizes. If you’re differentiation has a single chunk you need a cache of a single chunk as well. Its default is to assume Jacobians w.r.t. a sized variable matching the cache, but if you’re differentiating w.r.t. a scalar you need to have a single chunk. So the following works:

using ForwardDiff
using PreallocationTools

randmat = rand(10, 2)
sto = similar(randmat)
stod = dualcache(sto, Val{1})

function claytonsample!(sto, τ; randmat=randmat)
    sto = get_tmp(sto, τ)
    @show size(sto), size(randmat)
    sto .= randmat
    τ == 0 && return sto

    n = size(sto, 1)
    for i in 1:n
        v = sto[i, 2]
        u = sto[i, 1]
        sto[i, 2] = (1 - u^(-τ) + u^(-τ)*v^(-(τ/(1 + τ))))^(-1/τ)
    end
    return sto
end

ForwardDiff.derivative(τ -> claytonsample!(stod, τ), 0.3)
1 Like

Full explanation is now in the docs: https://github.com/SciML/PreallocationTools.jl/blob/master/README.md#handling-chunk-sizes

1 Like

Notice that stod = dualcache(sto, Val{1}) here, while in the docs you have stod = dualcache(sto), which results in the broadcast error:

ERROR: LoadError: DimensionMismatch("array could not be broadcast to match destination")
1 Like

Well yes, because you have to make sure you differentiate with the right chunk size.

2 Likes

Is there any reason for not defining indexing of DiffCache? Look at this example, where I’m getting an error because getindex is not defined

using Optim
using PreallocationTools

randmat = rand(10, 2)
sto = similar(randmat)
stod = dualcache(sto, Val{2})

function claytonsample!(sto, τ; randmat=randmat)
    sto .= randmat
    τ == 0 && return sto

    n = size(sto, 1)
    for i in 1:n
        v = sto[i, 2]
        u = sto[i, 1]
        sto[i, 2] = (1 - u^(-τ) + u^(-τ)*v^(-(τ/(1 + τ))))^(-1/τ)
    end
    return sto
end

function claytonsample!(stod::T, τ; randmat=randmat) where {T <: PreallocationTools.DiffCache}
    stod = get_tmp(stod, τ)
    claytonsample!(stod, τ; randmat=randmat)
end

function corr(X)
    x = X[:, 1]
    y = X[:, 2]
    sum(x .* y)/length(x) - sum(x) * sum(y)/length(x)^2
end

function obj(sto, τ; randmat=randmat)
    τ′ = abs(τ) - 1
    claytonsample!(sto, τ′; randmat=randmat)
    corr(sto)
end

opt_ad = optimize(τ -> obj(stod, τ[1]), [5.0, 5.0], BFGS(); autodiff=:forward)
ERROR: LoadError: MethodError: no method matching getindex(::PreallocationTools.DiffCache{Matrix{Float64}, Matrix{ForwardDiff.Dual{nothing, Float64, 2}}}, ::Colon, ::Int64)

No. The documentation shows that you should never act on the DiffCache, always on the tmp that comes out of it. If you’re indexing the DiffCache you’re doing something wrong. No example in the documentation ever shows doing anything remotely like that.

I think you just confused yourself by writing some confusing code. Remember sto is a DiffCache, so you need to get its tmp:

function obj(sto, τ; randmat=randmat)
    τ′ = abs(τ) - 1
    claytonsample!(sto, τ′; randmat=randmat)
    corr(get_tmp(stod, τ))
end

Though I would really clean up that code first because it’s very implicit in confusing ways.

2 Likes

I see. I wanted to have a method for claytonsample! when sto is a DiffCache and another one when sto is a Matrix. I have to do the same for obj.

Your simpler code

using ForwardDiff
using PreallocationTools

randmat = rand(10, 2)
sto = similar(randmat)
stod = dualcache(sto)

function claytonsample!(sto, τ; randmat=randmat)
    sto = get_tmp(sto, τ)
    sto .= randmat
    τ == 0 && return sto

    n = size(sto, 1)
    for i in 1:n
        v = sto[i, 2]
        u = sto[i, 1]
        sto[i, 2] = (1 - u^(-τ) + u^(-τ)*v^(-(τ/(1 + τ))))^(-1/τ)
    end
    return sto
end

ForwardDiff.derivative(τ -> claytonsample!(stod, τ), 0.3)

now works on the latest version. The chunk size difference is now handled automatically.

2 Likes

If I can just suggest something, it would be to keep more “transparent” functions available in the examples of the documentation. One of the things I struggled with when I started playing with the package, was understanding what it was doing. Examples with functions using OrdinaryDiffEq represent a hurdle for people who would want to use the PreallocationTools but not necessarily know or work with differential equations.