A ﬂexible sparse matrix data format and parallel algorithms for the assembly of sparse matrices in general ﬁnite element applications using atomic synchronisation primitives

Palli · August 25, 2022, 3:39pm

Unlike the above, below is just my speculation (that I moved from other thread, since maybe a bit off-topic there):

@elrod [are you only working on dense?]
This (likely) “sparsity 133/8 = 16.625” got got me thinking, how would you design hardware (or software, maybe we can do better for Julia) for sparse matrices.

I’m guessing matrices are split up into 4x4=16-sized blocks/submatrices (would 1x16 for one matrix and 16x1 be better for the other you are multiplying, or maybe 2x8 etc.?).

If your matrix is very sparse only at most one value will be nonzero in those submatrices. You could encode the emptyness of the submatrices into 1 bit, 0 for empty, 1 for non-empty (one or, hopefully not more). With such a dense array encoding the structure, you can bit-multiply to see where you actually need to calculate later the actual result, in a separate step. [This would also work for larger block, I’m not wedded to 16 in a block.]

You need to also keep track of your actual non-zero values, for e.g. Float8 in 4x4 box, you needs 8-bit, plus 2-bit x 2-bit for its position = 4 bits. If the box is empty you could store 0.0 in the Float8 part, and use or ignore the position. And maybe just use that (such arrays) and skip the bit-array…

This gives you 1.5 bytes for 4x4 subarrays that otherwise would give you 16 bytes (assuming max one non-zero, the common case), so only 10.6x compression, and you’ve still not handled the unlikely case when more than one non-zero in some box. I was thinking some overlay, one for the other 1+ non-zeros (or strictly 0 - max 15 extra), with a different structure (same as Julia already has?), or maybe a list of up to max 15 overlay arrays with same structure as previously described, with the completely non-zero for the full arrays eliminated in storage and computation.

Good design?

Topic		Replies	Views
Parallel assembly of a finite element sparse matrix General Usage parallel	34	1709	February 28, 2024
Question about Julia sparse matrices design decisions Internals & Design matrices	3	627	August 25, 2020
Multithreading with sparse arrays Numerics	3	1742	May 6, 2021
Notes on lock-free programming Teaching & Outreach	6	1178	May 27, 2021
[ANN] Fast SpMv with CompressedSparseBlocks.jl Package Announcements performance , linearalgebra , sparse	9	717	July 26, 2022

A ﬂexible sparse matrix data format and parallel algorithms for the assembly of sparse matrices in general ﬁnite element applications using atomic synchronisation primitives

Related topics