Reduce the allocation to zero

I have a question about whether it’s possible to reduce the following simple case allocation to zero

a = rand(5)
b = rand(5)
c = rand(5)
d = rand(5)
N = rand(8)

@time acl = sum(a) * (N[1] - N[5]) + sum(b) * (N[2] - N[6]) + sum(c) * (N[3] - N[7]) + sum(d) * (N[4] - N[8])  #  in my computer, the allocation is 21 after compilation, which is unexpectedly high

Recommended profiling setup is something like

using BenchmarkTools

@btime acl = sum(a) * (N[1] - N[5]) + sum(b) * (N[2] - N[6]) + sum(c) * (N[3] - N[7]) + sum(d) * (N[4] - N[8]) setup=(
    a = rand(5);
    b = rand(5);
    c = rand(5);
    d = rand(5);
    N = rand(8);   
)

(you are measuring in global scope with non-consts: recipe for ‘disaster’).

So, you mean in reality the result from @time is not true or why it show so high outputs?

The results are true for what you are doing but not representative for what can be done with Julia.

Please see the first two sections in the Performance Tips.

1 Like

sum(rand(5)) is equivalent to sum(_ -> rand(), 1:5) so your code is equivalent to the following code which uses no arrays:

sumrand(n) = sum(_ -> rand(), 1:n)
acl = sumrand(5) * (rand() - rand()) + sumrand(5) * (rand() - rand()) + sumrand(5) * (rand() - rand()) + sumrand(5) * (rand() - rand()) 

which is allocation-free:

julia> using BenchmarkTools

julia> @btime acl = sumrand(5) * (rand() - rand()) + sumrand(5) * (rand() - rand()) + sumrand(5) * (rand() - rand()) + sumrand(5) * (rand() - rand());
  68.299 ns (0 allocations: 0 bytes)

You could even do

acl = sum(_ -> sumrand(5) * (rand() - rand()), 1:5)

This will show allocations, but only because it is in global scope; in a function it will be allocation-free a well:

julia> f(n, m) = sum(_ -> sumrand(m) * (rand() - rand()), 1:n)
f (generic function with 1 method)

julia> @btime f(5,5);
  102.041 ns (0 allocations: 0 bytes)

But maybe I misunderstood your question, and you are only asking about reducing the allocations in the acl = ... line. In that case, the problem is simply that you are benchmarking using global variables as noted above.

Thanks for your detailed explanation, so if I understand correct, it’s better to put all the formula within a function, right?

1 Like

Yes. Read the Performance Tips in the manual, especially the first two tips.

1 Like