Sparse matrix coefficient computation with ForwardDiff and LoopVectorization

Hi everyone,
I am trying to write a code that computes both the residual and sparse matrix coefficients for multi-dimensional non-linear Poisson-type problems. The idea is that information about the stencil and boundary conditions are only present in the functions that evaluate the stencil (residual function + connectivity/sparsity function).
Sparse system coefficients can be computed manually/symbolically but this approach implies that even a tiny change the residual function requires a manual modification to the function that evaluates system coefficients. Using automatic differentiation, one could simply evaluate the system coefficients by computing the gradient of the residual function w.r.t stencil point values. As a result, modifications of the residual function do not necessarily require to modify the function that evaluates system coefficients - very nice. Also, since things can be made fast (using LoopVectorization), the time for the assembly of the system matrix can be kept much smaller than the time required for computing Cholesky factors.
Using the advices provided there I could come up with the MWE below for 1D linear Poisson.
While this script works fine, I would like to make it more general. In particular, I would like to be able to predefine the number of stencil points without having to change any other function than Poisson1D and PoissonSparsity1D.
For example, if I would want to use the current approach to make a 2D high order discretisation, I could end up with more that 10 stencil points and thus:

  1. I need to find an appropriate a data structure to store neighbour values/coefficients (TC,TW,TE) and indices (IC,IW,IE). Potential options are Vectors of Arrays, multi-dimensional Arrays, Structs, tuples… If anyone has some hints about that, they would be welcome :smiley: As far as I’ve experienced, the trickiest part is to find which data structure will allow the code to still work with both LoopVectorization and ForwardDiff.

  2. The definition and calls to Poisson_valgrad1D would require an embarrassingly large number of arguments. What would be the best way to organise the code such that a function like Poisson_valgrad1D would not need one argument entry per stencil connection but rather one data structure? This is of course intrinsically linked to point #1.

  3. Also one would need to change the way the triplets vectors (I,J,V) are assembled. But I guess this can be made nice by using a comprehension over the appropriate data structure (so also linked to point #1).

  4. Finally, any other hints or comments are welcome :slight_smile:

Cheers!

using SIMDDualNumbers, ForwardDiff, StaticArrays, LoopVectorization, SparseArrays, LinearAlgebra, Plots

function PoissonSparsity1D!(TC,TW,TE)
    TW[2:end-0] .= TC[1:end-1]; 
    TE[1:end-1] .= TC[2:end-0]; 
end

function Poisson1D(x::AbstractVector{<:Real}, data, i) 
    kx, dx, ncx, TW_BC, TE_BC = data
    TC, TW, TE = x[1], x[2], x[3]
    TW1     = SIMDDualNumbers.ifelse(i==1, 2TW_BC-TC, TW)
    TE1     = SIMDDualNumbers.ifelse(i==ncx, 2TE_BC-TC, TE)
    qxW     = -kx*(TC - TW1)/dx
    qxE     = -kx*(TE1 - TC)/dx 
    F       = (qxE - qxW)/dx
    return F
end

Poisson1D(TC::Real,TW::Real,TE::Real,data,i) = Poisson1D(@SVector([TC,TW,TE]),data,i)  # use multiple dispatch to make it work with the desired argument list
function Poisson_valgrad1D(Poisson1D,v,w,x,d,i)
    sv = @SVector([v,w,x])
    dr = ForwardDiff.gradient!(DiffResults.GradientResult(sv), x -> Poisson1D(x, d, i), sv)
    DiffResults.value(dr), DiffResults.gradient(dr)
end

function main()

    # Initialise
    ncx = 10
    data = (kx=1.0, dx=0.1, ncx, TW_BC=1.0, TE_BC=2.0)
    F, T = fill(0.0, ncx), fill(0.0, ncx)  # residual, solution fields 
    TC,TW,TE = fill(0.0,  ncx), fill(0.0,   ncx), fill(0.0,   ncx) # working arrays double
    IC,IW,IE = fill(  1,  ncx), fill(  1,   ncx), fill(  1,   ncx) # working arrays double
    I, J, V  = fill(  0,3*ncx), fill(  0, 3*ncx), fill(0.0, 3*ncx)

    # Get neigbour values and generate connectivity
    TC .= T
    PoissonSparsity1D!(TC,TW,TE)

    # Generate connectivity
    IC .= collect(1:ncx)
    PoissonSparsity1D!(IC,IW,IE) 
    
    # Compute Finite Difference coefficients
    @turbo for i=1:ncx 
        F[i], (TC[i], TW[i], TE[i]) = Poisson_valgrad1D(Poisson1D, TC[i], TW[i], TE[i], data,i)
    end

    # Fill triplets and assemble
    I .= [IC[:]; IC[:]; IC[:]]
    J .= [IC[:]; IW[:]; IE[:]]
    V .= [TC[:]; TW[:]; TE[:]]
    @time K  = sparse(I,J,V)
    droptol!(K,1e-9)

    @show K

    # Solve
    @time T[:] .-= cholesky(K)\F[:]

    # Set central node and get neigbour values
    TC .= T
    PoissonSparsity1D!(TC,TW,TE)

    # Compute Finite Difference coefficients
    @turbo for i=1:ncx
        F[i] = Poisson1D(TC[i],TW[i],TE[i],data,i)
    end
    println("r = ", norm(F[:]))

    # display(plot(T))
end

for it=1:2
    @time main()
end