R2r is surprisingly slower than complex fft

photor · November 10, 2022, 2:33am

const m=60

x=randn(m,m,m)
y=deepcopy(x)
p = FFTW.plan_r2r(y,FFTW.R2HC,flags=FFTW.MEASURE)
@btime mul!($y,$p,$x);

gives the result

572.302 μs (138 allocations: 10.16 KiB)

while

const m=60

x=randn(ComplexF64,m,m,m)
y=deepcopy(x)
p = plan_fft(y,flags=FFTW.MEASURE)
@btime mul!($y,$p,$x);

gives

551.774 μs (138 allocations: 10.16 KiB)

Have I missed some crucial point?

photor · November 10, 2022, 2:59am

Julia Version 1.6.7
Commit 3b76b25b64 (2022-07-19 15:11 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, haswell)
Environment:
  JULIA_NUM_THREADS = 4

photor · November 10, 2022, 3:28am

And any chance StructArrays can be used in FFTW?

TheLateKronos · November 13, 2022, 8:05am

Have you tried calling fft on the array directly, to see if you have the same relative performance?

photor · November 13, 2022, 10:24am

const m=60

x=randn(m,m,m)
@btime FFTW.r2r!($x,FFTW.R2HC);

gives

2.510 ms (55 allocations: 4.16 KiB)

while

const m=60

x=randn(ComplexF64,m,m,m)
@btime fft!($x);

gives

2.868 ms (31 allocations: 2.72 KiB)

Is it normal?

TheLateKronos · November 13, 2022, 10:33am

I honstestly have no idea. @stevengj could probably understand what goes on. (I hope it is okay pinging you like this)

stevengj · November 13, 2022, 2:58pm

No. The complex (and r2c) FFT algorithms are much more optimized in FFTW than the r2r plans, e.g. the former use SIMD and the latter do not. So even though a complex FFT takes about twice as much arithmetic as an R2HC plan of the same size, the former’s greater optimization makes up for it.

As explained in the FFTW manual, we generally recommend the r2c interface if what you want is a real-input DFT, especially in the multidimensional case. The r2hc algorithms are included mainly as stepping stones to other transforms, and because it is occasionally convenient to have a real-input DFT where the output is exactly the same size.

dlfivefifty · November 15, 2022, 7:43pm

This is too bad since r2r is useful for Chebyshev transforms. I remember for certain degrees this caused ApproxFun to be much slower than Chebfun (which just translated it to an FFT).

Topic		Replies	Views
[ANN] RustFFT.jl: Compute forward and inverse FFTs with RustFFT Package Announcements	13	1530	May 30, 2023
RustFFT.jl v0.2: AbstractFFTs interface and performance improvements Package Announcements	6	784	August 5, 2023
Why I can't use mul! and plan_fft together? New to Julia question , fftw , bug , linearalgebra	4	1031	January 26, 2022
FFTW Performance General Usage fftw , performance	25	4413	November 2, 2022
ANN: InplaceRealFFT.jl : inplace real-to-complex and complex-to-real FFT Numerics fftw	25	3307	January 5, 2018

R2r is surprisingly slower than complex fft

Related topics