Again on reaching optimal parallel scaling

lmiq · December 18, 2021, 12:57am

To be more specific: I didn’t change anything relevant in the implementation. The thing is that all allocations and GC occur in the first part, which is relatively fast for small number of threads.

However, with many cores, the fist part becomes limiting, because the slow one scales really well.

What I did now is to split the fist part into two: one where allocations and GC take place, and a second that assumes the buffers are allocated.

This clearly identifies the scaling problems with the allocations and GC of the fist step. Which makes your very first hints very accurate… and that clearly gives me path to potentially improve the code.

Topic		Replies	Views
Scaling of @threads for "embarrassingly parallel" problem Performance threads	29	2139	January 20, 2023
Huge performance fluctuations in parallel benchmark: insights? Performance parallel , multithreading , benchmarktools	52	2853	December 1, 2021
Julia code becomes slower on running on supercomputers and does not scale well when parallelizing with Base.Threads Julia at Scale fortran , parallel , linearalgebra , threads	73	2453	January 22, 2024
How to achieve perfect scaling with Threads (Julia 1.7.1) Performance multithreading	33	2670	January 13, 2023
Garbage collection and threading Performance memory-allocation	17	2115	December 20, 2023

Again on reaching optimal parallel scaling

Related topics