Simple performance test of threaded execution

I plan to work on multithreading until summer. So far see this old post of mime - one of the first things I plan is to update this…
To figure out the differences of the systems you can install hwloc on them. This also contains lstopo which gives a graphical overview of the architecture. There is also a Julia package Hwloc.jl (try Hwloc.topology_graphical() … )

AFAIK cache sizes and number of NUMA nodes are the main data to watch. In particular the number of NUMA nodes gives the number of independent pathways to RAM. For large problems, all threads will compete for this bottleneck. More expensive non-laptops may have more than one, which immediatly shows up in the multithreading performance. One can see that clearly in the last graph in that old thread.

2 Likes