Well, not for this kind of use with ForwardDiff. In fact, the performance very much depends on the way things are called internally in ForwardDiff which I did not investigate.
Your function is kind of the worst case for the linked list internal format used in ExtendableSparse, as it generates a full row in the matrix (will have to mention this caveat in the readme: I assume that “sparse” means that all rows/columns have << n entries). But it seems that this is not the problem here.
However, this brings me back to my previous post: “atomic” assembly of the finite difference operator without too much assumptions on the inner workings of ForwardDiff. So let me extend your example a bit (VoronoiFVM is my package):
# Finite difference operator for
# -\Delta u^2 + u^2 = 1
## Storage term
jac_ext = ExtendableSparseMatrix(n, n);
ForwardDiff.jacobian!(jac_ext, ffd, ones(n), ones(n));
jac = spzeros(n, n);
@time ForwardDiff.jacobian!(jac, ffd, ones(n), ones(n));
all(jac .== jac_ext)
# Set up stuff for VoronoiFVM which uses atomic assembly
tstep=1.0e100 # large timestep aproximates stationary problem
5.806884 seconds (32 allocations: 7.174 MiB)
3.925591 seconds (41 allocations: 6.275 MiB)
0.025753 seconds (117.03 k allocations: 3.849 MiB)
I intentionally use @time here, as @btime probably won’t be dominated by
the initial structure buildup phase. This is the kind of things I also benchmarked
before (see the benchmarking stuff with fdrand!() in ExtendableSparse).
In VoronoiFVM, I call ForwardDiff with flux!, reaction!, storage! and get local matrices which I then assemble into the global matrix.