Scaled FFT implementation seems to only use one thread despite setting no. of threads to 8

If you care about efficiency, you should use plan_fft to precompute a plan and re-use it, rather than calling fft repeatedly. (And ideally pass flags=FFTW.MEASURE or flags=FFTW.PATIENT.) This is the key to using FFTW effectively, as described in the FFTW FAQ.

(Matlab precomputes plans for a set of common sizes IIRC.)

There is also no need to use nextpow(2, …). FFTW is fast as long the sizes are composites of small factors 2,3,5,7. In particular, you can use nextprod([2,3,5], n) or nextprod([2,3,5,7], n) as discussed in another thread: How to set the length of fft … but it is better to set your length to be such a number to begin with rather than padding (which changes the transform, not to mention incurring copies).

1 Like