I am trying to follow your approach to calculate cross-correlations using FFT. In my case, I have millions of images to compare. Thus, the function you provided allocates way too much memories and GC would be very busy. In your code, every step will allocate a temporary array. I tried to eliminate the allocations as much as possible.
function corr_fft(P, IP, A, B)
cc = fftshift(real.(IP*(conj(P*A) .* (P*B))))
end
fftshift!(y, x) = circshift!(y, x, div.([size(x)...],2))
function corr_fft1!(cc, P, IP, A, B, XC)
FA = P*A
conj!(FA)
FB = P*B
FA .= FA .* FB
A_mul_B!(FB, IP, FA)
XC .= real.(FB)
fftshift!(cc, XC)
end
In this version, cc
and XC
are pre-allocated real arrays outside of the function. But I still have to allocate FA
and FB
, two complex arrays, for each call. Unfortunately, if one wants to use A_mul_B!(Y, plan, X)
, the input image X
has to be a complex array as well. But if the number of calls to this function is greater than the number of images, it might be worth to convert all images to complex arrays first and use the following version.
function corr_fft2!(cc, P, IP, AC, BC, FA, FB, XC)
A_mul_B!(FA, P, AC)
conj!(FA)
A_mul_B!(FB, P, BC)
FA .= FA .* FB
A_mul_B!(FB, IP, FA)
XC .= real.(FB)
fftshift!(cc, XC)
end
Here, cc
and XC
are pre-allocated real arrays, AC
and BC
are converted complex images, and FA
and FB
are pre-allocated complex arrays. For 100 x 100
pixel images, I benchmarked the three versions in my MacBook pro. AC
and BC
are complex arrays converted from A
and B
.
julia> @btime corr_fft($P, $IP, $A, $B)
204.048 μs (22 allocations: 625.95 KiB)
julia> @btime corr_fft1!($cc, $P, $IP, $A, $B, $XC)
193.094 μs (11 allocations: 313.03 KiB)
julia> @btime corr_fft2!($cc, $P, $IP, $AC, $BC, $FA, $FB, $XC)
177.746 μs (3 allocations: 224 bytes)
The last version is much faster than the first two when loops through millions of images. Now, if only there is a way to directly do A_mul_B!(Y, plan, X)
without requiring converting all my images to complex arrays, that would be super! I wonder if there is any method in FFTW package to do that. I didn’t find any in the manual.