I personally would prefer your option 3:
Create a structure with all the intermediate parameters and then using the structure as a callable function (call overloading). But I don’t really how the boilerplate can be done (without init here in the parent function) (see Emulate local static variable?)
It stores the persistent arrays in a transparent manner and leads to nice code. The boilerplate could look like this:
using FFTW, LinearAlgebra
struct preAllocatedSquareModulusFFT{T,P,U}
absout::Vector{T}
planFFT::P
internal::Vector{U}
end
function preAllocatedSquareModulusFFT(sig)
abs_out = zeros(eltype(real(sig)),length(sig));
planFFT = plan_fft(copy(sig);flags=FFTW.PATIENT);
internal = similar(sig)
preAllocatedSquareModulusFFT(abs_out, planFFT, internal)
end
function (f::(preAllocatedSquareModulusFFT))(x)
nbSamples = length(x)
# --- Compute FFT
mul!(f.internal,f.planFFT,x) # This is a FFT
# --- Abs2
for i in 1:1:nbSamples # Can be even faster with @avx and @inbounds but not the topic here :)
f.absout[i] = abs2(f.internal[i])
end
return f.absout
end
You would then call the function as follows:
N = 1024
sig = randn(Complex{Float64},N)
# Create the struct once
f = preAllocatedSquareModulusFFT(sig)
# And call (multiple times)
f(sig)