Parallel assembly of a finite element sparse matrix

termi-official · March 17, 2023, 1:49pm

Thanks for investigating this issue further Prof. Krysl. Sorry, I could not find time to come back to the issue linked by Kristoffer yet (but I still follow the thread). From a performance perspective the assembly on a single thread is more or less compute bound. Parallelizing assembly with a low number of threads should not be a big issue (with sufficient memory bandwidth and cache). However, with more threads you increase the pressure on all memory lanes. Here I still think it is a mixture of cache/bandwidth issues (i.e. bad or even conflicting cache access patterns+memory bus cannot keep up with the CPUs read/write access) and frequency boosting (i.e. at lower total load each core has higher frequency). Fore some discussion I highly recommend the WorkStream paper (dx.doi.org/10.1145/2851488), because we basically reproduce Figure 4 from this paper. However, take this with a grain of salt, as I still have to confirm everything in more detailed benchmarks.

Another relevant thread is How to achieve perfect scaling with Threads (Julia 1.7.1) - #10 by carstenbauer which discusses some of the mentioned problems in more detail.

Topic		Replies	Views
Slower with threads Performance question	26	1171	August 6, 2022
Parallelizing multiple Crank–Nicolson solvers Performance linearalgebra	21	1437	March 13, 2021
Huge performance fluctuations in parallel benchmark: insights? Performance parallel , multithreading , benchmarktools	52	2629	December 1, 2021
Why with @threads, the execution time is worse? Performance question , multithreading	19	2797	September 16, 2021
Why doesn't multithreading help here? Performance	12	1415	August 22, 2020

Parallel assembly of a finite element sparse matrix

Related topics