I’m trying to tune a pseudospectral PDE code which has calls to fft
and ifft
at every iteration. I was wondering if there was some way to reduce the repeated memory allocations (assuming this improves performance). By this I mean that if I run the code:
using FFTW
using BenchmarkTools
function apply_n_fft(u, n)
uhat = complex.(similar(u));
for _ in 1:n
uhat .= fft(u);
end
end
u = randn(32);
@btime apply_n_fft(u, 5);
@btime apply_n_fft(u, 50);
I get:
6.433 μs (49 allocations: 7.83 KiB)
68.084 μs (454 allocations: 70.41 KiB)
That the time went up by a factor of 10 is not so surprising; it’s the memory allocations that surprise me.
Use a plan:
julia> using FFTW, LinearAlgebra, BenchmarkTools
julia> u = randn(32);
julia> function apply_n_fft(u, n)
uhat = complex.(similar(u));
for _ in 1:n
uhat .= fft(u);
end
end
apply_n_fft (generic function with 1 method)
julia> @btime apply_n_fft($(u), 5);
7.727 μs (49 allocations: 7.83 KiB)
julia> @btime apply_n_fft($(u), 50);
81.314 μs (454 allocations: 70.41 KiB)
julia> function apply_n_fft_plan(u, n)
uc = complex(u)
uhat = similar(uc);
plan = plan_fft(uc)
for _ in 1:n
mul!(uhat, plan, uc)
end
end
apply_n_fft_plan (generic function with 1 method)
julia> @btime apply_n_fft_plan($(u), 5);
1.713 μs (8 allocations: 1.34 KiB)
julia> @btime apply_n_fft_plan($(u), 50);
2.932 μs (8 allocations: 1.34 KiB)
julia> @btime apply_n_fft_plan($(u), 500);
15.597 μs (8 allocations: 1.34 KiB)
Without a precomputed plan, fft
internally needs to (a) re-compute the same plan all over again which is also a very time-consuming task which you don’t really want to repeat, (b) convert the input array to complex, and (c) allocate the output array, every single time. With the pre-computed plan, and having the input array already as complex numbers, lets you save all these repeated allocations, and the mul!
would be entirely in-place, with the number of total allocations not depending on n
.
6 Likes