Hi! I am struggling to keep a certain level of flexibility and abstraction while optimizing for performance. For example, in the following function, I am doing some processing and modify an array arr
in-place.
arr = zeros(2, 4)
p = [1.3, 6.8]
function f1!(arr, p)
fp1, fp2 = floor.((p[1], p[2]))
cp1, cp2 = ceil.((p[1], p[2]))
arr .= [fp1 fp2 fp1 fp2 ; cp1 cp1 cp2 cp2]
end
However, the intermediate array created just before broadcasting allocates memory:
julia> @benchmark f1!($a, $p)
BenchmarkTools.Trial:
memory estimate: 272 bytes
allocs estimate: 9
--------------
minimum time: 72.401 ns (0.00% GC)
median time: 146.003 ns (0.00% GC)
mean time: 152.632 ns (16.16% GC)
maximum time: 15.043 μs (99.33% GC)
--------------
samples: 10000
evals/sample: 976
and I can avoid this by simply assigning by hand each value in arr
, as done in f2!
:
function f2!(arr, p)
fp1, fp2 = floor.((p[1], p[2]))
cp1, cp2 = ceil.((p[1], p[2]))
arr[1, 1] = fp1
arr[1, 2] = cp1
arr[1, 3] = fp1
arr[1, 4] = cp1
arr[2, 1] = fp2
arr[2, 2] = fp2
arr[2, 3] = cp2
arr[2, 4] = cp2
end
julia> @benchmark f2!($a, $p)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 4.305 ns (0.00% GC)
median time: 4.412 ns (0.00% GC)
mean time: 4.574 ns (0.00% GC)
maximum time: 29.816 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
Is there a possibility to avoid memory allocation while keeping the clarity of f1!
?