Ok, this thread kind of went on in my head and I felt that one possibility was still missing…
I did some experiments with preventing setting zero entries in jacobian.jl . As a result, we get:
So this gives an improvement of assembly times by a constant factor, but we still see the O(N^2) behavior. This appears to be due to the fact that without further a priori information, ForwardDiff assumes that the jacobian is a full matrix and calculates all these zeros, still leading to O(N^2) complexity.
For a known sparsity pattern (as we have for this case), the remedy suggested is to provide the sparsity pattern along with coloring information such that derivative calculations can be scheduled to handle only the nonzero entry cases.
So instead of
ForwardDiff.jacobian!(jac, ffd, X, Y);
for jac with unknown sparsity pattern we do for jac with known sparsity pattern and nonzero entries:
As we see, we now get O(N) behavior because now known zero jacobian entries are not calculated to begin with.
So we have two ways to handle the sparsity here: matrix coloring (once the sparsity pattern is known) or atomic assembly (which with ExtendableSparse performs reasonably well without a priori information about the sparsity pattern).
According to the test, the sparsity detection algorithm implemented exhibits O(N) behavior. In this example it is beaten by the atomic assembly. However, e.g. in an implementation of Newton’s algorithm and in many other cases one would perform the sparsity detection and matrix coloring only once and re-use the results during the computation of series of jacobians, so the relative overhead from the detection step would decrease.
I find the O(N) behavior of the sparsity detection step quite amazing. The algorithm behind this is described here and uses more deep Julia stuff…
Yeah our sparse automatic differentiation compiler tools are still pretty young but they should be pretty good for serious use now. That’s cool you did a verification of it on a big problem. It would be nice to have the code you’re testing on to turn this into a blog post.
Ok I will try to find some time for this in a week or two.
I also did some preliminary tests for 2D and saw the same O(N) behaviour. I assume there will be ways to beat the constant as well…
In fact I am thinking to integrate this into VoronoiFVM.jl to handle sparse couplings in systems of n PDEs - for flux and reaction terms I currently locally get full n\times 2n resp. n\times n matrices which often are sparse and test for zeros before putting values into the big matrix - this case could take advantage already of the current implementation.