Hi,
I am trying to minimize memory allocations of a function in my package.
However, it displays some mysterious behaviour - at least for me.
Since there are many experts here, maybe somebody can shed some light on whatβs going on here.
Here is a MWE of the problem:
I have an outer!
function that modifies a vector x
by calling an inner
function on each element of x
. Additionally a function is passed as an argument provided by the user:
function outer!(x, f=identity)
for i in eachindex(x)
x[i] += inner(f)
end
end
function inner(f)
y = zero(Float64)
for i in 1:10
y += f(1)
end
return y
end
If I call this version it results in a lot of allocations:
x = zeros(1000)
@benchmark outer!($x)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 63.021 ΞΌs β¦ 9.655 ms β GC (min β¦ max): 0.00% β¦ 99.07%
Time (median): 82.074 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 84.586 ΞΌs Β± 207.426 ΞΌs β GC (mean Β± Ο): 5.42% Β± 2.21%
ββββββββ βββββ
β
ββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββ
ββ
ββ
ββ
ββββββ
ββ β
63 ΞΌs Histogram: log(frequency) by time 138 ΞΌs <
Memory estimate: 54.52 KiB, allocs estimate: 3489.
Note that this only happens if inner
has the for
loop. if I define inner
without the loop all allocations vanish.
However, if I just evaluate f
in outer!
once, the allocations also vanish:
function outer!(x, f=identity)
f(1) # evaluate f once
for i in eachindex(x)
x[i] += inner(f)
end
end
@benchmark outer!($x)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min β¦ max): 71.814 ns β¦ 5.371 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 72.181 ns β GC (median): 0.00%
Time (mean Β± Ο): 76.442 ns Β± 67.196 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
βββββ β β
βββββββββββββββββββββββββ
ββ
βββ
ββ
β
ββ
βββ
β
β
β
ββ
βββββ
βββββββββββ β
71.8 ns Histogram: log(frequency) by time 135 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
I am trying to understand what leads to the memory allocations in this case and why evaluating f
a single time gets rid of them. Iβm fine with leaving f(1)
in my code, but it seems like a rather hacky solution and Iβm sure there is a better way of solving this problem.
Hopefully somebody can help me here.