Hello there
I have a question on how we can avoid array re-allocation (without changing the API of a function) by making variables (and especially array) persistent from one function call to another, specialized on the input type and size and that belong to the scope of a function.
A simple example of a function that computes the square root of the FFT. Note that this example is representative of what I often do, small functions called a very high number of time and where pre-allocation saves a lot.
using FFTW
N = 1024
sig = randn(Complex{Float64},N)
function directSquareModulusFFT(x::Vector{Complex{T}}) where T
# Compute FFT using FFTW
z = fft(x)
# Compute square modulus
return abs2.(z)
end
directSquareModulusFFT(sig)
Some modifications can be made to speed up the process and be more efficient
using FFTW , LinearAlgebra
N = 1024
sig = randn(Complex{Float64},N)
function allocatedSquareModulusFFT(x::Vector{Complex{T}}) where T
nbSamples = length(x)
# --- Init all stuff and prepare FFT
abs_out = zeros(T,nbSamples)
planFFT = plan_fft(copy(x);flags=FFTW.PATIENT)
internal = zeros(Complex{T},nbSamples)
# --- Compute FFT
mul!(internal,planFFT,x) # This is an FFT
# --- Abs2
for i in 1:1:nbSamples # Can be even faster with @avx and @inbounds but not the topic here :)
abs_out[i] = abs2(internal[i])
end
return abs_out
end
allocatedSquareModulusFFT(sig)
The key aspect here is that I may call very often allocatedSquareModulusFFT
so the pre-allocation should not be done in the function to save memory and time.
using FFTW, LinearAlgebra
N = 1024
sig = randn(Complex{Float64},N)
# --- Init all stuff and prepare FFT
abs_out = zeros(eltype(real(sig)),length(sig));
planFFT = plan_fft(copy(sig);flags=FFTW.PATIENT);
internal = similar(sig);
# We can now call the function with the temp variables
function preAllocatedSquareModulusFFT(x::Vector{Complex{T}},abs_out,planFFT,internal) where T
nbSamples = length(x)
# --- Compute FFT
mul!(internal,planFFT,x) # This is a FFT
# --- Abs2
for i in 1:1:nbSamples # Can be even faster with @avx and @inbounds but not the topic here :)
abs_out[i] = abs2(internal[i])
end
return abs_out
end
preAllocatedSquareModulusFFT(sig,abs_out,planFFT,internal)
The last function is totally OK in terms of performance but require to put some code before the function call, and all the intermediate variables only depends on the type and the size of the input array.
I can of course wrap all the temp variables into a parametric structure and pass it each time I call preAllocatedSquareModulusFFT
but I was wondering if there is (or we can create) a way to automatically pass (and hide :D) the intermediate variables from one call to another without performance penalty.
I was thinking on a macro for instance that would do the following
using FFTW
N = 1024
sig = randn(Complex{Float64},N)
function magicSquareModulusFFT(x::Vector{Complex{T}}) where T
@persistent begin
# --- Init all stuff and prepare FFT
abs_out = zeros(T,nbSamples); #@MVector
planFFT = plan_fft(copy(x);flags=FFTW.PATIENT);
internal = similar(x);
end
# --- Compute FFT
mul!(internal,planFFT, x) # This is a FFT
# --- Abs2
for i in 1:1:nbSamples # Can be even faster with @avx and @inbounds but not the topic here :)
abs_out[i] = abs2(internal[i])
end
return abs_out
end
magicSquareModulusFFT(sig)
with the code in @persistent
only evaluated at the fist function call. The other time the function is called (at least with same function signature) the variables defined in @persistent
are in the scope.
The ultimate goal is to hide the boilerplate associated to pre-allocation such that the high level API of a function does not require a anterior init
function call.
Having the benefits of the optimisation we have made but also the simpler API definition (one input parameter and that’s all folks).
It is also appealing in protopyting as one can just play and change what is persistant without having to restart Julia in case of structure modifications or without change the parent script (with the init function).
Alternative approaches I have though about (without good results :()
- Create a
initSquareModulusFFT(x)
function and then call it into themagicSquareModulusFFT
using memoize. It seems that the performance penalty is huge as theDict
isAny
(?) - I have seen that another approach can be to use closure with
let
but it will not (I think ?) be adapted to the array type and size. (see https://github.com/JuliaLang/julia/issues/15056) - Create a structure with all the intermediate parameters and then using the structure as a callable function (call overloading). But I don’t really how the boilerplate can be done (without init here in the parent function) (see Emulate local static variable?)
So, any remarks would be welcome here