Getting rid of memory allocations in nested functions

Hi,

I am trying to minimize memory allocations of a function in my package.
However, it displays some mysterious behaviour - at least for me.
Since there are many experts here, maybe somebody can shed some light on what’s going on here.

Here is a MWE of the problem:

I have an outer! function that modifies a vector x by calling an inner function on each element of x. Additionally a function is passed as an argument provided by the user:

function outer!(x, f=identity)
    for i in eachindex(x)
        x[i] += inner(f)
    end
end

function inner(f)
    y = zero(Float64)
    for i in 1:10
        y += f(1)
    end
    return y
end

If I call this version it results in a lot of allocations:

x = zeros(1000)
@benchmark outer!($x)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  63.021 ΞΌs …   9.655 ms  β”Š GC (min … max): 0.00% … 99.07%
 Time  (median):     82.074 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   84.586 ΞΌs Β± 207.426 ΞΌs  β”Š GC (mean Β± Οƒ):  5.42% Β±  2.21%

  β–„β–‚β–ˆβ–†β–ƒβ–„β–ƒβ–ƒ      ▄▆▆▆▅▅▄▃▃▂▂▁▁▁                                 β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–†β–†β–„β–†β–…β–ƒβ–…β–„β–…β–‚β–…β–„β–„β–‚β–„β–ƒβ–…β–„β–„ β–ˆ
  63 ΞΌs         Histogram: log(frequency) by time       138 ΞΌs <

 Memory estimate: 54.52 KiB, allocs estimate: 3489.

Note that this only happens if inner has the for loop. if I define inner without the loop all allocations vanish.

However, if I just evaluate f in outer! once, the allocations also vanish:

function outer!(x, f=identity)
    f(1)  # evaluate f once
    for i in eachindex(x)
        x[i] += inner(f)
    end
end
@benchmark outer!($x)

BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  71.814 ns …  5.371 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     72.181 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   76.442 ns Β± 67.196 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–ˆβ–„β–‚β–‚β–  ▁                                                    ▁
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–†β–‡β–†β–†β–‡β–†β–†β–…β–†β–…β–†β–†β–…β–†β–…β–…β–„β–…β–„β–„β–…β–…β–…β–…β–†β–…β–„β–„β–„β–„β–…β–„β–„β–„β–„β–ƒβ–„β–„β–„β–β–„β–„ β–ˆ
  71.8 ns      Histogram: log(frequency) by time       135 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

I am trying to understand what leads to the memory allocations in this case and why evaluating f a single time gets rid of them. I’m fine with leaving f(1) in my code, but it seems like a rather hacky solution and I’m sure there is a better way of solving this problem.

Hopefully somebody can help me here.

When passing functions as arguments, it’s better to add the type parameter in the signature so it can specialize on whatever function f is :

function outer2!(x; f::F=identity) where F
    for i in eachindex(x)
        x[i] += inner(f)
    end
end

julia> @btime outer!($x)
  45.900 ΞΌs (3489 allocations: 54.52 KiB)

julia> @btime outer2!($x)
  52.487 ns (0 allocations: 0 bytes)
5 Likes

Well, that was a fast solution… Many thanks!

1 Like

See also the (admittedly slightly cryptic) remarks in the Julia Performance Tipps.

1 Like