Well, not for this kind of use with ForwardDiff. In fact, the performance very much depends on the way things are called internally in ForwardDiff which I did not investigate.
Your function is kind of the worst case for the linked list internal format used in ExtendableSparse, as it generates a full row in the matrix (will have to mention this caveat in the readme: I assume that āsparseā means that all rows/columns have << n entries). But it seems that this is not the problem here.
However, this brings me back to my previous post: āatomicā assembly of the finite difference operator without too much assumptions on the inner workings of ForwardDiff. So let me extend your example a bit (VoronoiFVM is my package):
using ExtendableSparse
using ForwardDiff
using LinearAlgebra
using SparseArrays
using VoronoiFVM
#
# Finite difference operator for
# -\Delta u^2 + u^2 = 1
#
function ffd(y,x)
n=length(x)
h=1/(n-1)
y[1]=(x[1]^2-1)*0.5*h
for i=2:n-1
y[i]=(x[i]^2-1.0)*h
end
y[end]=(x[end]^2-1)*0.5*h
for i=1:n-1
dx=(x[i+1]^2-x[i]^2)/h
y[i+1]+=dx
y[i]-=dx
end
end
function flux!(f,u,edge)
f[1]=u[1,1]^2-u[1,2]^2
end
## Storage term
function storage!(f,u,node)
f[1]=u[1]
end
function reaction!(f,u,node)
f[1]=u[1]^2-1.0
end
function runtest(n)
jac_ext = ExtendableSparseMatrix(n, n);
@time begin
ForwardDiff.jacobian!(jac_ext, ffd, ones(n), ones(n));
flush!(jac_ext)
end
jac = spzeros(n, n);
@time ForwardDiff.jacobian!(jac, ffd, ones(n), ones(n));
all(jac .== jac_ext)
# Set up stuff for VoronoiFVM which uses atomic assembly
h=1.0/convert(Float64,n-1)
X=collect(0:h:1)
grid=VoronoiFVM.Grid(X)
physics=VoronoiFVM.Physics(flux=flux!, storage=storage!,reaction=reaction!)
sys=VoronoiFVM.System(grid,physics,unknown_storage=:dense)
enable_species!(sys,1,[1])
solution=unknowns(sys,inival=1.0)
oldsol=unknowns(sys,inival=1.0)
residual=unknowns(sys,inival=1.0)
tstep=1.0e100 # large timestep aproximates stationary problem
@time begin
VoronoiFVM.eval_and_assemble(sys,solution,oldsol,residual,tstep)
flush!(sys.matrix)
end
all(sys.matrix.ājac_ext)
end
second run:
julia> runtest(20000)
5.806884 seconds (32 allocations: 7.174 MiB)
3.925591 seconds (41 allocations: 6.275 MiB)
0.025753 seconds (117.03 k allocations: 3.849 MiB)
true
I intentionally use @time here, as @btime probably wonāt be dominated by
the initial structure buildup phase. This is the kind of things I also benchmarked
before (see the benchmarking stuff with fdrand!() in ExtendableSparse).
In VoronoiFVM, I call ForwardDiff with flux!, reaction!, storage! and get local matrices which I then assemble into the global matrix.