Allocations with anonymous functions

I was experimenting with StaticArrays.jl, and noticed the following issue with memory allocations. Consider the following example code:

using Random
using StaticArrays

function f1(X, h, D)
    Y = D.*X- h;
    return Y
end


function f2(X)
    D = @SVector [0.1,1,1]; 
    h = @SVector [0.1,0,0]; 
    Y = D.*X- h;
    return Y
end

function apply_n_times(X, f_func, n)
    Y =  copy(X)
    for j = 1:n
        Y = f_func(Y);
    end
    return Y
end

Random.seed!(100);
X =  @SVector randn(3);

D = @SVector [0.1,1,1]; 
h = @SVector [0.1,0,0]; 

n = 10^4;
@time Y = apply_n_times(X, X->f1(X,h,D), n);
println(Y)
@time Y = apply_n_times(X, f2, n);
println(Y)

If I then run this a second time (after things get precompiled), I see the following results:

julia> @time Y = apply_n_times(X, X->f1(X,h,D), n);
  0.017026 seconds (37.15 k allocations: 1.533 MiB)

julia> @time Y = apply_n_times(X, f2, n);
  0.000051 seconds (5 allocations: 192 bytes)

Showing a substantial improvement in performance when I do not use the anonymous function. This leads to the following questions:

  1. Is this an inherent issue when using anonymous functions, or is it some combination of using them with StaticArrays.jl?
  2. Is there a way to avoid this performance hit with anonymous functions?

if there is a performance hit, think of the fact that it is the first time compiler sees the function – since it’s a anonymous function just got defined in the same line.

1 Like

Nope, if I do:

g = X->f1(X,h,D)
@time Y = apply_n_times(X, g, n);

it’s just as bad

You are measuring compilation time. Use the @btime macro from BenchmarkTools.jl.

1 Like

Partially, but that’s not the main problem. The real issue is this:

X =  @SVector randn(3)

D = @SVector [0.1,1,1]
h = @SVector [0.1,0,0]

g = X->f1(X,h,D)

which the OP appears to be running in the global scope, so g is holding references to global variables.

@gideonsimpson Here’s a self contained example showing that correctly used anonymous functions carry no runtime burden:

using Random
using StaticArrays

function f1(X, h, D)
    Y = D.*X - h; 
    return Y
end


function f2(X)
    D = @SVector [0.1,1,1]; 
    h = @SVector [0.1,0,0]; 
    Y = D.*X - h;
    return Y
end

function apply_n_times(X, f_func, n)
    Y =  copy(X)
    for j = 1:n
        Y = f_func(Y);
    end
    return Y
end

Random.seed!(100);
let
    X =  @SVector randn(3);

    D = @SVector [0.1,1,1]; 
    h = @SVector [0.1,0,0]; 

    n = 10^4;

    g = X -> f1(X, h, D)
    
    @btime apply_n_times($X, $g, $n);  # 22.285 μs (0 allocations: 0 bytes)
    @btime apply_n_times($X, $f2, $n); # 22.287 μs (0 allocations: 0 bytes)
end

Here I used a let block to introduce a local scope so that the compiler was able to be guaranteed that the values of X, D and h wouldn’t change during the evaluation which enabled many optimizations. There are other ways of doing this, such as declaring X, D and h to be const or by using a closure to produce g (though that can be impacted by the closure performance bug if not done carefully).

6 Likes