Bad performances when using Multithreading and Distributed with heavy LinearAlgebra calculations

ufechner7 · July 24, 2024, 11:42am

Memory bound means that the performance is limited by the memory bandwidth, compute bound means the performance is limited by the speed (and number) of the CPU cores.

abraemer · July 24, 2024, 11:56am

The picture to have in mind here is called Roofline Model.

In this simplified model there are essentially 2 resources: memory throughput and compute throughput. Loading and storing of values to RAM takes memory throughput and essentially everything else takes compute time. Parallelization essentially increases the available computing power but does not increase memory throughput. So using more threads only helps if you don’t saturate the memory bandwidth.

ufechner7 · July 24, 2024, 12:45pm

Well, we also have cache memory that has a much higher memory bandwidth, but only a limited size.

albertomercurio · July 25, 2024, 12:49pm

Is the cache automatically used in Julia, when the array size fits the cache size?

albertomercurio · July 25, 2024, 12:53pm

By the way, thanks to everyone who made it much more clear the situation.

Oscar_Smith · July 25, 2024, 1:04pm

This isn’t a Julia thing but a chip thing. CPUs don’t expose the ability to manage cached memory. The most recently used memory just always gets put in cache.

albertomercurio · July 25, 2024, 10:59pm

Ok, thanks a lot.

Topic		Replies	Views
Performance issue with multithreaded computation with matrix operations at its heart (Threads.@threads vs. BLAS threads) Performance blas , parallel , multithreading , linearalgebra , threads	7	411	November 13, 2023
Distributed Performance Degradation New to Julia distributed	5	162	May 23, 2025
Poor Distributed performance for independent linear algebra operators Performance	9	462	January 10, 2024
Julia matrix-multiplication performance Performance linearalgebra	20	8623	October 30, 2022
Independent LU factorization of small matrices not faster with threads Performance question	10	705	October 5, 2020

Bad performances when using Multithreading and Distributed with heavy LinearAlgebra calculations

Related topics