I’m trying to speed up the runtime of a back-projection image formation algorithm for SAR and want to see how far things can be pushed for the naive version using parallel computing and the speed of Julia (I’m aware that there are more efficient factorized versions of this algorithm).
Here is a MWE:
module Test
using LinearAlgebra
using Interpolations
using FLoops
using BenchmarkTools
const c = 3e8
function backprojection(itp, positions, xs, ys, fcomp)
@floop for (k,q) in enumerate(positions)
p = zeros(3)
subimage = zeros(Complex{Float64}, length(xs), length(ys))
for (i,x) in enumerate(xs), (j,y) in enumerate(ys)
# point coordinates in the image plane
@inbounds p[1] = x
@inbounds p[2] = y
# range between radar and image point
r = norm(p - q)
# corresponding propagation delay
td = 2*r/c
# remove initial carrier phase and add contribution
# of this radar pulse to image point
@inbounds subimage[i, j] += itp(k, td) * fcomp(td)
end
@reduce(image .+= subimage)
end
image
end
function run(M, N)
# dummy data
data = randn(Complex{Float64}, M, 4096)
# sample points for interpolation
ax1 = 1:size(data, 1)
ax2 = 1:size(data, 2)
# x and y image points
xs = range(-2.0, 2.0, length=N)
ys = range(-2.0, 2.0, length=N)
# dummy position vectors
positions = [randn(3) for _ in ax1]
# dummy compensation function
fcomp(td) = exp(2im*pi*td)
# interpolation object for data array
itp = interpolate((ax1, ax2), data, (NoInterp(), Gridded(Linear())))
# compute backprojection sum
@benchmark backprojection($itp, $positions, $xs, $ys, $fcomp)
end
end
Benchmarking it using BenchmarkTools.jl gives the following result
$ julia -t 4
julia> using Revise
julia> include("Test.jl")
julia> Test.run(400, 256)
BenchmarkTools.Trial: 6 samples with 1 evaluation.
Range (min … max): 913.720 ms … 951.016 ms ┊ GC (min … max): 35.12% … 37.45%
Time (median): 929.735 ms ┊ GC (median): 36.39%
Time (mean ± σ): 931.355 ms ± 15.284 ms ┊ GC (mean ± σ): 36.32% ± 0.93%
▁ ▁ █
█▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁ ▁
914 ms Histogram: frequency by time 944 ms <
Memory estimate: 3.51 GiB, allocs estimate: 26216432.
It seems like a large amount of time is still spent on garbage collection. I’m afraid it might have to do with the allocation of the subimage
array in the backprojection
function.
There might be a way to avoid allocating subimage
in every iteration of the loop but alas the API for FLoops.jl is a little confusing to me so I’m not sure how exactly that would work.
Is there a way to reduce the GC time in this particular function?
Thanks!