Parallel FFT not that much faster

frank-otto · December 19, 2016, 7:32am

Your machine B may run into some of the following issues:

The CPU has hyperthreading (2 threads per core). When you do the FFT with 2 threads, how can you be sure that two physical cores are used, and not two threads on the same core? Ideally, the threads should first be distributed over the physical cores, and only if these are exhausted, the 2nd thread on each core should be used. But I’m afraid I don’t know whether FFTW takes care to pin the threads to physical cores. If not, the behavior depends on the OS and maybe on what else is running on the machine.
Modern CPUs have “turbo-boost”, i.e. when just one thread is running, the CPU frequency can be (much) higher than when multiple threads are running (due to the thermal budget). So by using more threads/cores, the CPU might clock down, i.e. the performance won’t scale.
FFT of size N has a runtime of O(N log N), i.e. the work per data element is actually not large. If an optimized implementation (like FFTW) is used, such an algorithm’s performance may actually not be limited by CPU resources, but by memory bandwidth. Note that your machine A has two CPU sockets, i.e. when you use two threads it can probably use twice the memory bandwidth because each CPU has its own memory controller.

For a proper scaling benchmark, it is often recommended to turn off turbo-boost and hyperthreading, though this may require changing settings in the BIOS. Modern Linux kernels may offer tools to change that behavior for a running system, but I’m not up-to-date on that.

Topic		Replies	Views
Scaled FFT implementation seems to only use one thread despite setting no. of threads to 8 New to Julia fftw , multithreading	2	481	October 27, 2022
FFTW scales pretty well (some @btime benchmarks) Performance fftw , gpu , parallel , multithreading	1	1703	February 4, 2025
FFTW multithreading and plans General Usage fftw , multithreading	3	748	July 11, 2023
Threads, @spawn on Mac - inconsistent behavior Performance	1	680	March 3, 2021
FFTW Performance General Usage fftw , performance	25	4330	November 2, 2022

Parallel FFT not that much faster

Related topics