ANN: InplaceRealFFT.jl : inplace real-to-complex and complex-to-real FFT

Aliasing arrays with Complex{Float64} and Float64 is exactly what rfft! and irfft! are about :disappointed:.

I was mainly using it for pseudo-spectral fluid flow simulation code, where part of the PDE is calculated in real space and part in Fourier space. Then I believe using lazy-reinterpretation would be a deal breaker :anguished:.

Optim.jl also will be hit by this.

@Keno stated on Sign in to GitHub · GitHub

I intend to have the performance fixed by the time 0.7 is released. The
performance was optimal at some point on my branch, but something clearly
regressed. Just busy with semantic changes at the moment.

so hopefully this will go away soon and those tricks will be unnecessary.

1 Like

I haven’t looked at this implementation yet, but FWIW I’ve often wanted
an rfft! function that took a pre-allocated Complex array to put the
result in, not necessarily re-using the input array. Would that work for
your usecase or does it really need to be the same array with in-place
operation?

That is already possible using a precomputed plan:

julia> a = rand(8,8)
8×8 Array{Float64,2}:
 0.59679    0.131602  0.184409  0.558209    0.289836  0.268683  0.580835  0.289121
 0.31875    0.416771  0.409353  0.0981198   0.451174  0.260841  0.202204  0.917322
 0.124103   0.191237  0.620929  0.798103    0.589914  0.772676  0.838405  0.2102
 0.812172   0.686616  0.929844  0.341353    0.247563  0.698019  0.254909  0.687846
 0.0254606  0.385726  0.409981  0.097717    0.985738  0.29928   0.565726  0.716829
 0.40668    0.598828  0.184622  0.00956236  0.359331  0.418159  0.70267   0.66258
 0.335555   0.33336   0.244317  0.546701    0.138519  0.498856  0.971377  0.493033
 0.146153   0.832016  0.288027  0.866792    0.742087  0.545129  0.452103  0.774817

julia> b = zeros(Complex128,(5,8));

julia> p = plan_rfft(a)
FFTW real-to-complex plan for 8×8 array of Float64
(rdft2-rank>=2/1
  (rdft2-r2hc-direct-8-x8 "r2cf_8")
  (dft-direct-8-x5 "n1fv_8_avx"))

julia> A_mul_B!(b,p,a)
5×8 Array{Complex{Float64},2}:
   29.8156+0.0im       -0.154824+2.44274im    …   -1.26989-0.730506im  -0.154824-2.44274im
 -0.784323-0.402335im   -1.70144+0.0786512im      0.595502-1.23948im    0.769457+0.922158im
  -1.32134+2.88848im     2.44856-1.03407im         1.55985-0.338671im  -0.830614-0.666949im
 -0.389619+0.765363im    1.37924-2.20592im       -0.512497+0.595245im    1.47723-2.13217im
  -1.62918+0.0im        -3.22961+1.2663im         -1.39024-0.926478im   -3.22961-1.2663im

This package is to provide “truly” inplace transform. The main advantage is memory saving. I believe it is also faster because of memory locality, but that is just a guess.

1 Like

Here is a small benchmark:

julia> using InplaceRealFFT, BenchmarkTools

julia> a = zeros(256,256,256);

julia> op = plan_rfft(a,flags=FFTW.MEASURE);

julia> b = zeros(Complex128,(129,256,256));

julia> pa = PaddedArray(size(a));

julia> ip = plan_rfft!(pa,flags=FFTW.MEASURE);

julia> a .= rand(256,256,256);

julia> pa.r .= a;

julia> @benchmark A_mul_B!($b,$op,$a)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     155.066 ms (0.00% GC)
  median time:      159.190 ms (0.00% GC)
  mean time:        159.736 ms (0.00% GC)
  maximum time:     171.139 ms (0.00% GC)
  --------------
  samples:          32
  evals/sample:     1

julia> @benchmark *($ip,$pa)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     152.768 ms (0.00% GC)
  median time:      156.669 ms (0.00% GC)
  mean time:        156.766 ms (0.00% GC)
  maximum time:     162.761 ms (0.00% GC)
  --------------
  samples:          32
  evals/sample:     1

I do get consistently smaller times for rfft!, but it is always just a tiny bit faster…

1 Like

A PR to FFTW.jl was made. I tried doing what at-yuyichao said was necessary but I’m not sure I did it correctly. Hopefully at-stevengj will be able to analyze it.

2 Likes