A ﬂexible sparse matrix data format and parallel algorithms for the assembly of sparse matrices in general ﬁnite element applications using atomic synchronisation primitives

Palli · August 25, 2022, 3:33pm

@elrod @PetrKryslUCSD Seems intriguing: https://www.researchgate.net/publication/346555859_A_flexible_sparse_matrix_data_format_and_parallel_algorithms_for_the_assembly_of_sparse_matrices_in_general_finite_element_applications_using_atomic_synchronisation_primitives

we propose a new ﬂexible storage format for sparse matrices and compare its performance with the compressed row storage format using abstract benchmarks based on common characteristics of ﬁnite element problems

Maybe Julia should adopt this new CRAC format from last year:

In contrast, this paper focuses on the application of synchronization primitives in order to derive data-race free data structures and algorithms and compares their respective performance. Speciﬁcally, the usage of atomics is emphasized
[…]
In this paper we introduce a new ﬂexible sparse matrix storage format designated compressed row aligned columns (CRAC). The ﬂexibility of the format stems from its storage of blocks of various sizes, making it independent of the problem type. Consequently, this format integrates well in ﬁnite element programs, where the problem to be solved is deﬁned at runtime (dynamically).

Remark 1.1 The focus of this paper is not the improvement in performance of a speciﬁc ﬁnite element problem or the deﬁnition of a correspondingly matching sparse matrix format, but rather the introduction and a general investigation of the performance of parallel assembly algorithms with respect to the parallelization strategy and the sparse matrix format. Consequently, the computation of each element stiﬀness matrix is not the main focus of this article and dummy elements are employed.

using lock-free methods does not guarantee higher performance.
Additionally, the CRAC format now performs better than the CSR format for p > 3.

Palli · August 25, 2022, 3:39pm

Unlike the above, below is just my speculation (that I moved from other thread, since maybe a bit off-topic there):

@elrod [are you only working on dense?]
This (likely) “sparsity 133/8 = 16.625” got got me thinking, how would you design hardware (or software, maybe we can do better for Julia) for sparse matrices.

I’m guessing matrices are split up into 4x4=16-sized blocks/submatrices (would 1x16 for one matrix and 16x1 be better for the other you are multiplying, or maybe 2x8 etc.?).

If your matrix is very sparse only at most one value will be nonzero in those submatrices. You could encode the emptyness of the submatrices into 1 bit, 0 for empty, 1 for non-empty (one or, hopefully not more). With such a dense array encoding the structure, you can bit-multiply to see where you actually need to calculate later the actual result, in a separate step. [This would also work for larger block, I’m not wedded to 16 in a block.]

You need to also keep track of your actual non-zero values, for e.g. Float8 in 4x4 box, you needs 8-bit, plus 2-bit x 2-bit for its position = 4 bits. If the box is empty you could store 0.0 in the Float8 part, and use or ignore the position. And maybe just use that (such arrays) and skip the bit-array…

This gives you 1.5 bytes for 4x4 subarrays that otherwise would give you 16 bytes (assuming max one non-zero, the common case), so only 10.6x compression, and you’ve still not handled the unlikely case when more than one non-zero in some box. I was thinking some overlay, one for the other 1+ non-zeros (or strictly 0 - max 15 extra), with a different structure (same as Julia already has?), or maybe a list of up to max 15 overlay arrays with same structure as previously described, with the completely non-zero for the full arrays eliminated in storage and computation.

Good design?

PetrKryslUCSD · August 25, 2022, 5:08pm

Good find! I will check it out.
Thanks.

Topic		Replies	Views
Parallel assembly of a finite element sparse matrix General Usage parallel	34	1709	February 28, 2024
Question about Julia sparse matrices design decisions Internals & Design matrices	3	627	August 25, 2020
Multithreading with sparse arrays Numerics	3	1742	May 6, 2021
Notes on lock-free programming Teaching & Outreach	6	1178	May 27, 2021
[ANN] Fast SpMv with CompressedSparseBlocks.jl Package Announcements performance , linearalgebra , sparse	9	717	July 26, 2022

A ﬂexible sparse matrix data format and parallel algorithms for the assembly of sparse matrices in general ﬁnite element applications using atomic synchronisation primitives

Related topics