Repeated allocations for fft in pseudospectral code

I’m trying to tune a pseudospectral PDE code which has calls to fft and ifft at every iteration. I was wondering if there was some way to reduce the repeated memory allocations (assuming this improves performance). By this I mean that if I run the code:

using FFTW    
using BenchmarkTools

function apply_n_fft(u, n)
    uhat = complex.(similar(u));
    for _ in 1:n
        uhat .= fft(u);
    end
end

u = randn(32);
@btime apply_n_fft(u, 5);
@btime apply_n_fft(u, 50);

I get:

  6.433 μs (49 allocations: 7.83 KiB)
  68.084 μs (454 allocations: 70.41 KiB)

That the time went up by a factor of 10 is not so surprising; it’s the memory allocations that surprise me.

Use a plan:

julia> using FFTW, LinearAlgebra, BenchmarkTools

julia> u = randn(32);

julia> function apply_n_fft(u, n)
           uhat = complex.(similar(u));
           for _ in 1:n
               uhat .= fft(u);
           end
       end
apply_n_fft (generic function with 1 method)

julia> @btime apply_n_fft($(u), 5);
  7.727 μs (49 allocations: 7.83 KiB)

julia> @btime apply_n_fft($(u), 50);
  81.314 μs (454 allocations: 70.41 KiB)

julia> function apply_n_fft_plan(u, n)
           uc = complex(u)
           uhat = similar(uc);
           plan = plan_fft(uc)
           for _ in 1:n
               mul!(uhat, plan, uc)
           end
       end
apply_n_fft_plan (generic function with 1 method)

julia> @btime apply_n_fft_plan($(u), 5);
  1.713 μs (8 allocations: 1.34 KiB)

julia> @btime apply_n_fft_plan($(u), 50);
  2.932 μs (8 allocations: 1.34 KiB)

julia> @btime apply_n_fft_plan($(u), 500);
  15.597 μs (8 allocations: 1.34 KiB)

Without a precomputed plan, fft internally needs to (a) re-compute the same plan all over again which is also a very time-consuming task which you don’t really want to repeat, (b) convert the input array to complex, and (c) allocate the output array, every single time. With the pre-computed plan, and having the input array already as complex numbers, lets you save all these repeated allocations, and the mul! would be entirely in-place, with the number of total allocations not depending on n.

6 Likes