Thank you for the answer, I will read through the paper!
Here, I paste a naive MWE, which I guess probably shows how things should not be done. Improvements are welcome !
using Base.Threads, ExtendableSparse, LinearAlgebra
@views function serial_assembly!(M, n)
for i=1:n
if i==1
M[i,i ] = 1.0
M[i,i+1] = -1.0
elseif i==n
M[i,i-1] = -1.0
M[i,i ] = 1.0
else
M[i,i-1] = -1.0
M[i,i ] = 2.0
M[i,i+1] = -1.0
end
end
end
@views function threaded_assembly!(M, M_loc, n)
Threads.@threads for i=1:n
tid = threadid()
if i==1
M_loc[tid-1][i,i ] = 1.0
M_loc[tid-1][i,i+1] = -1.0
elseif i==n
M_loc[tid-1][i,i-1] = -1.0
M_loc[tid-1][i,i ] = 1.0
else
M_loc[tid-1][i,i-1] = -1.0
M_loc[tid-1][i,i ] = 2.0
M_loc[tid-1][i,i+1] = -1.0
end
end
end
let
n = 10_000_000
# Serial
M1 = ExtendableSparseMatrix(n,n)
@time serial_assembly!(M1, n)
# Threaded
M2 = ExtendableSparseMatrix(n,n)
M2_loc = [ExtendableSparseMatrix(n,n) for i=1:nthreads()]
@time threaded_assembly!(M2, M2_loc, n)
# Reduce. Looks like a conversion to sparse is needed to allow for .+
# This makes things a slightly faster
@time for k=1:nthreads()
M2 .= sparse(M2) .+ sparse(M2_loc[k])
end
# Check
@show norm(M1 .- M2)
end
In this 1D example, the reduction is super slow (threaded slower than serial). For 2D problems, there is a bit of benefit of doing this, but it’s still a very inefficient way. What could be the most straightforward way to thread such type of assembly?