Gradient evaluation with ForwardDiff and LoopVectorization

Hi everyone,

I am trying use the ForwardDiff.gradient! function to compute the in-place gradient of a function F with regard to 3 parameters.
F has ncx entries and the function has 3 parameters per entry. I explicitly parse in an input vector of size of (3, ncx). I would expect that returned gradient should also have a size of (3, ncx). I have tried with the following MWE, but it seems that my definition of Poisson1D_grad!(dFdT, Poisson1D_v1!, F, Tmat, data) is incorrect. It always returns an error about non-matching method. I have tried several modifications in order to match the method but without success. Would someone has a hint?
Cheers!

using SIMDDualNumbers, ForwardDiff, StaticArrays, LoopVectorization, SparseArrays, LinearAlgebra, Plots

function PoissonSparsity1D_v1!(x::AbstractMatrix{<:Real})
    @views TC, TW, TE = x[1,:], x[2,:], x[3,:]
    TW[2:end-0] .= TC[1:end-1]; 
    TE[1:end-1] .= TC[2:end-0]; 
end

function Poisson1D_v1!(F::AbstractVector{<:Real},x::AbstractMatrix{<:Real}, data ) 
    kx, dx, ncx, TW_BC, TE_BC = data
    @turbo for i=1:ncx
        TC, TW, TE = x[1,i], x[2,i], x[3,i]
        TW1     = SIMDDualNumbers.ifelse(i==1,   2TW_BC-TC, TW)
        TE1     = SIMDDualNumbers.ifelse(i==ncx, 2TE_BC-TC, TE)
        qxW     = -kx*(TC - TW1)/dx
        qxE     = -kx*(TE1 - TC)/dx 
        F[i]    = (qxE - qxW)/dx
    end
end

Poisson1D_grad!(g, Poisson1D_v1!, F, x, d) = ForwardDiff.gradient!(g, (F,x) -> Poisson1D_v1!(F, x, d), x )

function main()

    # Initialise
    ncx  = 10
    data = (kx=1.0, dx=0.1, ncx, TW_BC=1.0, TE_BC=2.0)
    T    = fill(0.0, ncx)     # residual, solution field 
    Tmat = fill(0.0,  3, ncx) # Contains sparsity (3 stencil points) 

    # Get neigbour values and generate connectivity
    Tmat[1,:] .= T
    PoissonSparsity1D!(Tmat)

    # Evaluate function
    F    = zero(T)
    Poisson1D_v1!(F, Tmat, data)

    # Compute and store gradient 
    dFdT = zero(Tmat)
    Poisson1D_grad!(dFdT, Poisson1D_v1!, F, Tmat, data)
end

for it=1:2
    @time main()
end

Currently, you can @turbo a ForwardDiff more easily than you can ForwardDiff an @turbo. The latter will require you to reinterpret the arrays of duals and then handle the partials correctly.
Otherwise, @turbo will just invoke a slower fallback.

However, the problem with the code currently isn’t the @turbo – you can remove it and still get the same problems – but the use of ForwardDiff. Specifically, this line:

Poisson1D_grad!(g, Poisson1D_v1!, F, x, d) = ForwardDiff.gradient!(g, (F,x) -> Poisson1D_v1!(F, x, d), x )

Try something like

using ForwardDiff, StaticArrays, LoopVectorization
@inline function Poisson1D_v1(x::NTuple{3}, data, i ) 
    kx, dx, ncx, TW_BC, TE_BC = data
    TC, TW, TE = x
    TW1     = SIMDDualNumbers.ifelse(i==1,   2TW_BC-TC, TW)
    TE1     = SIMDDualNumbers.ifelse(i==ncx, 2TE_BC-TC, TE)
    qxW     = -kx*(TC - TW1)/dx
    qxE     = -kx*(TE1 - TC)/dx 
    return (qxE - qxW)/dx
end
@inline Poisson1D_v1(x::SVector{3}, data, i) =  Poisson1D_v1(Tuple(x), data, i)
@inline function Poisson1D_v1_grad(x::SVector{3}, data, i)
    ForwardDiff.gradient(y -> Poisson1D_v1(y, data, i), x)
end
@inline function Poisson1D_v1_grad(x::NTuple{3}, data, i)
    Tuple(Poisson1D_v1_grad(SVector(x), data, i))
end
function Poisson1D_grad!(dFdT, Tmat, data)
    @turbo for i in axes(Tmat, 2)
        x = tuple(Tmat[1,i], Tmat[2,i], Tmat[3,i])
        (dFdT[1,i], dFdT[2,i], dFdT[3,i]) = Poisson1D_v1_grad(x, data, i)
    end
end

Now:

julia> Poisson1D_grad!(dFdT, Tmat, data)

julia> dFdT
3×10 Matrix{Float64}:
  300.0   200.0   200.0   200.0   200.0   200.0   200.0   200.0   200.0   300.0
    0.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0
 -100.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0     0.0

Two points to remember:

  1. ForwardDiff.gradient uses dual numbers, so it takes more work to preallocate outputs than using F like in your code.
  2. ForwardDiff.gradient assumes you have a function mapping N >= 1 inputs to 1 return value. ForwardDiff.jacobian is for N >= 1 inputs and M >= 1 outputs. Here, I’m assuming that you want elementwise gradients, i.e. you want gradients for functions of 3 inputs and 1 output, for each of ncx columns.

Many thanks!!
Your reply fully answers to my question, as well as this previous one.
I see that trying to combine these tools is not a trivial activity… Your detailed example is very useful and enlightening - and thanks for the generals hints too!