Thank you, @stevengj and @Jeff_Emanuel, for the thoughtful responses. Incredibly helpful. Really appreciate it. I was able to speedup the code considerably by following your advice and using threads.
This code is considerably faster:
FFTW.set_num_threads(Threads.nthreads() - 3) # 3 threads will be used for inner loop below
for i in 1:iters
#Get estimates using threads to multiprocess
Threads.@threads for j in 1:3
if j == 1
#input function for x
f_x .= x_fun(s,ẑ,a_xx,b_zx) # f_x: 2_500_000-element Vector{Float64}
fftfilt!(x̂,g_x,f_x) # g_x: 1000-element Vector{Float64}
elseif j == 2
#input function for y
f_y .= y_fun(x̂,ẑ, a_xy,b_zy) # 2_500_000-element Vector{Float64}
fftfilt!(ŷ,g_y,f_y) # g_y: 1000-element Vector{Float64}
else
#input function for z
f_z .= z_fun(ŷ,a,b,z_p) # 2_500_000-element Vector{Float64}
fftfilt!(ẑ,g_z,f_z) # g_z: 100000-element Vector{Float64}
end
end
end
Interestingly, I didn’t notice any difference when I changed the vector lengths. I also have been unable to see much of an improvement by computing the FFT’s for the filters outside of the loops. I am sure I am doing something (many things…) wrong, but I have no idea what. Any advice would be greatly appreciated.
I have included a comparison example where despite using FFTW.PATIENT
and precomputing the FFT for the filter outside the loop I am unable to beat fftfilt
:
using BenchmarkTools,DSP,FFTW,LinearAlgebra
end_time = 25000 - .01 # end time for signal
filt_end = 1000 - .01 # end time for filter
δ = .01 # granularity
g = 100.0*exp.(-100.0*(0.0:δ:filt_end)); #filter, length 100_000,
# Method 1 Precompute Patient Plan:
x = sin.(0.0:δ:end_time).^2; #signal, length 2_500_000
lx =length(x) #length of signal: 2_500_000
# Compute the full convolution length
N = lx + length(g) - 1; #length of arrays minus 1
# Create zero-padded arrays:
xp = zeros(Float64, N);
gp = zeros(Float64, N);
#populate padded arrays
xp[1:lx] .= x;
gp[1:length(g)] .= g;
# Create FFT plans with PATIENT flag:
plan_x = plan_rfft(xp, flags=FFTW.PATIENT);
plan_g = plan_rfft(gp, flags=FFTW.PATIENT);
# Precompute FFT's:
G_fft = plan_g * gp;
X_fft = plan_x * xp;
conv_full_x = irfft(X_fft .* G_fft, N)[1:lx]; # compute convolution
@benchmark begin
for i in 1:10
#update x
xp[1:lx] .+= sin.(0.0:δ:end_time).^2
#calculate x fft
mul!(X_fft,plan_x, xp)
#inversre fft
conv_full_x .= irfft(X_fft .* G_fft, N)[1:lx];
end
end
#Method 2 fftfilt!:
x_filt = sin.(0.0:δ:end_time).^2 #signal array length 2_500_000
conv_full_x_filt = fftfilt(g,x); # compute convolution
@benchmark begin
for i in 1:10
#update x
x_filt .+= sin.(0.0:δ:end_time).^2
#convolve
fftfilt!(conv_full_x_filt,g,x_filt)
end
end
Thanks,
DS