# Gradient evaluation with ForwardDiff and LoopVectorization

Hi everyone,

I am trying use the `ForwardDiff.gradient!` function to compute the in-place gradient of a function F with regard to 3 parameters.
F has `ncx` entries and the function has 3 parameters per entry. I explicitly parse in an input vector of size of `(3, ncx)`. I would expect that returned gradient should also have a size of `(3, ncx)`. I have tried with the following MWE, but it seems that my definition of `Poisson1D_grad!(dFdT, Poisson1D_v1!, F, Tmat, data)` is incorrect. It always returns an error about non-matching method. I have tried several modifications in order to match the method but without success. Would someone has a hint?
Cheers!

``````using SIMDDualNumbers, ForwardDiff, StaticArrays, LoopVectorization, SparseArrays, LinearAlgebra, Plots

function PoissonSparsity1D_v1!(x::AbstractMatrix{<:Real})
@views TC, TW, TE = x[1,:], x[2,:], x[3,:]
TW[2:end-0] .= TC[1:end-1];
TE[1:end-1] .= TC[2:end-0];
end

function Poisson1D_v1!(F::AbstractVector{<:Real},x::AbstractMatrix{<:Real}, data )
kx, dx, ncx, TW_BC, TE_BC = data
@turbo for i=1:ncx
TC, TW, TE = x[1,i], x[2,i], x[3,i]
TW1     = SIMDDualNumbers.ifelse(i==1,   2TW_BC-TC, TW)
TE1     = SIMDDualNumbers.ifelse(i==ncx, 2TE_BC-TC, TE)
qxW     = -kx*(TC - TW1)/dx
qxE     = -kx*(TE1 - TC)/dx
F[i]    = (qxE - qxW)/dx
end
end

Poisson1D_grad!(g, Poisson1D_v1!, F, x, d) = ForwardDiff.gradient!(g, (F,x) -> Poisson1D_v1!(F, x, d), x )

function main()

# Initialise
ncx  = 10
data = (kx=1.0, dx=0.1, ncx, TW_BC=1.0, TE_BC=2.0)
T    = fill(0.0, ncx)     # residual, solution field
Tmat = fill(0.0,  3, ncx) # Contains sparsity (3 stencil points)

# Get neigbour values and generate connectivity
Tmat[1,:] .= T
PoissonSparsity1D!(Tmat)

# Evaluate function
F    = zero(T)
Poisson1D_v1!(F, Tmat, data)

dFdT = zero(Tmat)
end

for it=1:2
@time main()
end
``````

Currently, you can `@turbo` a `ForwardDiff` more easily than you can `ForwardDiff` an `@turbo`. The latter will require you to reinterpret the arrays of duals and then handle the partials correctly.
Otherwise, `@turbo` will just invoke a slower fallback.

However, the problem with the code currently isn’t the `@turbo` – you can remove it and still get the same problems – but the use of `ForwardDiff`. Specifically, this line:

``````Poisson1D_grad!(g, Poisson1D_v1!, F, x, d) = ForwardDiff.gradient!(g, (F,x) -> Poisson1D_v1!(F, x, d), x )
``````

Try something like

``````using ForwardDiff, StaticArrays, LoopVectorization
@inline function Poisson1D_v1(x::NTuple{3}, data, i )
kx, dx, ncx, TW_BC, TE_BC = data
TC, TW, TE = x
TW1     = SIMDDualNumbers.ifelse(i==1,   2TW_BC-TC, TW)
TE1     = SIMDDualNumbers.ifelse(i==ncx, 2TE_BC-TC, TE)
qxW     = -kx*(TC - TW1)/dx
qxE     = -kx*(TE1 - TC)/dx
return (qxE - qxW)/dx
end
@inline Poisson1D_v1(x::SVector{3}, data, i) =  Poisson1D_v1(Tuple(x), data, i)
ForwardDiff.gradient(y -> Poisson1D_v1(y, data, i), x)
end
end
@turbo for i in axes(Tmat, 2)
x = tuple(Tmat[1,i], Tmat[2,i], Tmat[3,i])
(dFdT[1,i], dFdT[2,i], dFdT[3,i]) = Poisson1D_v1_grad(x, data, i)
end
end
``````

Now:

``````julia> Poisson1D_grad!(dFdT, Tmat, data)

julia> dFdT
3×10 Matrix{Float64}:
300.0   200.0   200.0   200.0   200.0   200.0   200.0   200.0   200.0   300.0
0.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0
-100.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0  -100.0     0.0
``````

Two points to remember:

1. ForwardDiff.gradient uses dual numbers, so it takes more work to preallocate outputs than using `F` like in your code.
2. `ForwardDiff.gradient` assumes you have a function mapping `N >= 1` inputs to `1` return value. `ForwardDiff.jacobian` is for `N >= 1` inputs and `M >= 1` outputs. Here, I’m assuming that you want elementwise gradients, i.e. you want gradients for functions of 3 inputs and 1 output, for each of `ncx` columns.

Many thanks!!