p = FFTW.plan_r2r(y,FFTW.R2HC,flags=FFTW.MEASURE)
gives the result
572.302 μs (138 allocations: 10.16 KiB)
p = plan_fft(y,flags=FFTW.MEASURE)
551.774 μs (138 allocations: 10.16 KiB)
Have I missed some crucial point?
Julia Version 1.6.7
Commit 3b76b25b64 (2022-07-19 15:11 UTC)
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
LLVM: libLLVM-11.0.1 (ORCJIT, haswell)
JULIA_NUM_THREADS = 4
And any chance StructArrays can be used in FFTW?
Have you tried calling fft on the array directly, to see if you have the same relative performance?
2.510 ms (55 allocations: 4.16 KiB)
2.868 ms (31 allocations: 2.72 KiB)
Is it normal?
I honstestly have no idea. @stevengj could probably understand what goes on. (I hope it is okay pinging you like this)
No. The complex (and r2c) FFT algorithms are much more optimized in FFTW than the r2r plans, e.g. the former use SIMD and the latter do not. So even though a complex FFT takes about twice as much arithmetic as an R2HC plan of the same size, the former’s greater optimization makes up for it.
As explained in the FFTW manual, we generally recommend the r2c interface if what you want is a real-input DFT, especially in the multidimensional case. The r2hc algorithms are included mainly as stepping stones to other transforms, and because it is occasionally convenient to have a real-input DFT where the output is exactly the same size.
This is too bad since r2r is useful for Chebyshev transforms. I remember for certain degrees this caused ApproxFun to be much slower than Chebfun (which just translated it to an FFT).