I have been thinking of what would be the equivalent of static arrays for sparse matrices and entertaining the possibility of specializing linear algebra for every sparsity pattern thus possibly achieving similar speed benefits as StaticArrays does for dense matrices. If the cost of specialization is amortized over many operations which use the same sparsity pattern, then it may be worth the trouble.
Then it immediately struck me that to do something like this, I would need to encode the sparsity pattern information in the type of the static sparse matrix. With meta-programming this seems doable on my side, but is there any practical limit on the number of type parameters a type could have beyond which the compiler will lose its temper?
Another interesting challenge to the above idea even with meta-programming is that types will have a variable number of parameters depending on the sparsity level, so somehow constructors have to handle this. Not sure if this is even possible so I would appreciate a more informed opinion.
(Not sure if this is a usage or development topic but feel free to change it.)
AFAIK SuiteSparse/UMFpack has facilities to reuse sparsity patterns, and it can speed up calculations significantly. So this would be a great experiment to do natively in Julia.
Note that you have to allow for the fact that if you want to keep code type stable, not all entries in the non-sparse slots will be nonzero (eg when adding an x and a -x). But this should still be very useful.
I don’t understand what you mean by this. The non-sparse are the structural non-zeros, you mean? Yes some of them could be 0 in value even though they are stored, but how is this related to type stability?
As for adding x and a-x, ideally the specialized code should cancel the x’s out but it might be challenging to implement it, a symbolic math package may make these simplifications possible.
I was simply pointing out that since the sparsity is encoded in the type, if you enforce that all stored values are nonzeros, you would get type-unstable code. But it seems like you are not, so you should be fine.
Maybe I could start with simple operations like matrix addition and matrix-vector multiplication, and see if a speedup can happen there, then move on to back and forward substitution. Then the conjugate gradient method can be implemented with these ingredients. All of these operations don’t involve inferring an output sparsity pattern. More complicated operations like decompositions can be left for later.
Sorry, I know I am just talking to myself mostly here , but I am using this platform to share my plans and see if others would be interested in such a thing, or maybe even have some suggestions.
My hunch would be that a branchless solver for linear systems (with a NaN/Inf result for zero elements in the wrong place, which would need to be checked) could provide a large speed advantage.
AFAIK fixing the sparsity pattern is feasible regargless of size. They key is that it is reused across calculations.
Algorithms exist for “special” sparse matrices (eg band matrices). It frequently happens that you get a sparse matrix which would be one of these special ones, except for a few elements; but unless it is very commonly used, you will not find an already derived algorithm for it. A sparsity pattern is not unlike developing a specific algorithm for your very own special case, except doing it efficiently and in an automated way.
AFAIK fixing the sparsity pattern is feasible regargless of size. They key is that it is reused across calculations.
sure, that makes perfect sense.
But I had the impression that the OP proposed to make the sparsity pattern a type parameter? In my world this seems very strange, but this doesn’t mean there aren’t applications elsewhere.
I explored somthing similar for the evaluation of multivariate polynomials here and the results are very positive! In my experience the compiler can handle the type inference quite well, where the compiler struggles more is to compile generated functions of great length (I have easily 2000 LOC in some functions).
No a priori any stranger than making array dimensions type parameters I think the issue is that before Julia, these things just could not have been done in a practical manner (AFAIK Haskell had (has?) something similar, but the emphasis is on static checks). So they may appear unusual, but it is worth experimenting with.
This comes up the the numerical solution of systems of partial differential equations. For example, in the Euler/Navier-Stokes equations the flux jacobians (or some other related jacobians) are 5 x 5 matrices that are somwhat sparse and/or symmetric. Code patterns like this:
# q = solution array, 5 x mesh.numNodesPerElement x mesh.numEl
A1 = zeros(5, 5)
for i=1:mesh.numEl # many thousands of elements
for j=1:mesh.numNodesPerElement
calcA1(q[:, j, i], A1) # compute A1 from q
specializedMatVec(A1, x) # x is some vector
end
end
In cases like this, writing an unrolled mat-vec function that takes advantage of sparsity and symmetry can be beneficial because the operation is repeated so many times.
That’s different though, and should be handled differently. When it has a pattern like a stencil, you can just implement the stencil like is done in DiffEqOperators.jl
That just does the stencil computation during A_mul_B! (and thus *) so there’s no need for it to understand the non-zero elements because it directly understands the sparsity pattern (a band of size dependent on the order, so based off of type parameters).
Banded matrices also have a bunch of special properties, so you wouldn’t want to put them into sparse matrix algorithms even if they knew to only loop over the band.
That said,
I’ve actually been wondering about finding an implementation for something like this myself. I have a few things in mind. For example, a static sparse vector could be a very easy way to extend abstract array based APIs in a way that is “selective” and concise in code, yet efficient.
As an example, let’s say we are solving a large PDE via method of lines and there’s some components that we know will be on the shallow end of a temperature gradient and so those 6 specific points should have a lower absolute tolerance. The API for these things is always, if it’s an array then it’s element-wise. So what I want is, instead of creating an array 1e-6*ones(N) and then changing a few values, specifying “a sparse array with a base of 1e-6 and specifically 6 values at indices …”. Then it’s easy to see what element-wise operations like . * should be like for this kind of sparse array, but I can dramatically cut back on the number of arrays I am making if every option doesn’t need to take up extra memory to accommodate element-wise settings. So a static sparse array is something I find myself wanting quite often (with the caveat of being able to declare “what zero means”).
As for doing this with matrices, I think the way to go about it is to be a little bit more advanced. Full sparity I think is too much and will lead to large compile times. When I think about the applications which come up in PDEs that aren’t banded (though banded is quite often!) a lot of times there’s a higher level special structure to take into account. For example, if you think about the 2D Laplacian on a uniform grid, what you get if you’re to use “all x then next y” ordering is a tridiagonal matrix plus a band N up and a band N down. If there was a way to easily declare a matrix directly for this structure then that could be a massive speedup over pure sparse representations. This also comes up with systems of PDEs like the one I describe here:
where the sparsity pattern is really easy (and cheap) to describe in blocks (and the Jacobian is just constant bands in many of those blocks!). In some cases, if you’re doing a spatial model of a more complex reaction network like HIRES
it can be very easy to state the local sparsity pattern, which would then be the location of the bands in the full PDE’s sparsity pattern. Having an easy way to put these together would be a nice low-dimensional way to write down the form for these large matrices.
So where am I going with this? Okay, let’s say you had the facility to say “this is a matrix with a constant band a long the -ith diagonal of value x” and things like that (or if it’s not constant, it could be a type with a single array), and have it generate efficient kernels for addition, multiplication, etc. (multiplications is really all you need for linear solving because you’ll likely want to use an iterative solver in this case). We can build our full operator using the lazy composition tooling in LinearMaps.jl
where A = B+C or A = B*C can put together various AbstractLinearMaps lazily so that A*x acts appropriately. Thus our full linear operator A (maybe a Jacobian of a PDE or optimization problem) would both be a very sparse and memory efficient implementation, and be able to specialize all of its operations on the pieces in order to turn everything into efficient loops.
So this went on for quite awhile, but what I hope I got across is that full sparsity is probably not what’s needed in most applications anyways. However, composable sparse matrix structures would be extremely useful for generating both memory and computationally efficient kernels. This is something I haven’t seen done either (if anyone has seen it, please link because I am curious) and it would fit really well with Julia’s generic programming since these could just be passed in place of standard dense/sparse matrix types yet generate very efficient algorithms.
Well, my main motivation beside curiosity was topology optimization, where it is common to solve a finite element problem for example 10-1000s of times, where the sparsity pattern is fixed but the actual matrix solved for is different in every iteration. I am not sure how scalable this static sparse matrix approach could be, but as @saschatimme mentioned handling excessively long generated functions could be a problem for the compiler.
While the recommended “small” matrix computations on the StaticArrays README is a 100 element matrix, I think for sparse matrices it could be more than a 100 nonzeros but there will still be a limit somewhere. An interesting way to boost up the scalability would be to use block matrix algorithms with some blocking technique that would minimize the number of unique sparsity patterns of the blocks. Such blocking algorithm could benefit for example from information like where the bands are in @ChrisRackauckas’s example.
This is all very much in the “is it even worth it?” stage, so I am still pondering the idea while reading StaticArrays to figure out implementation pathways. I am also happy to see there are other applications to such a project beside my motivation use case.
I wasn’t describing the system jacobian, I was describing the flux jacobian at a particular node (or the diffusion tensor for the node for the NS equation). There is no banded structure in this case. For unstructured grids, there is no stencil either.
If you want a more through example of the case I was describing, check out below equation 6 in “Symmetric Interior Penalty DG Methods for the Compressible Navier-Stokes Equaiotns I: Method Formulation” by Harten and Houston where the node-level diffusion tensor is described.