xspeng
April 7, 2022, 12:04pm
1
I have a question about whether it’s possible to reduce the following simple case allocation to zero
a = rand(5)
b = rand(5)
c = rand(5)
d = rand(5)
N = rand(8)
@time acl = sum(a) * (N[1] - N[5]) + sum(b) * (N[2] - N[6]) + sum(c) * (N[3] - N[7]) + sum(d) * (N[4] - N[8]) # in my computer, the allocation is 21 after compilation, which is unexpectedly high
goerch
April 7, 2022, 12:10pm
2
Recommended profiling setup is something like
using BenchmarkTools
@btime acl = sum(a) * (N[1] - N[5]) + sum(b) * (N[2] - N[6]) + sum(c) * (N[3] - N[7]) + sum(d) * (N[4] - N[8]) setup=(
a = rand(5);
b = rand(5);
c = rand(5);
d = rand(5);
N = rand(8);
)
(you are measuring in global scope with non-const
s: recipe for ‘disaster’).
xspeng
April 7, 2022, 12:18pm
3
So, you mean in reality the result from @time is not true or why it show so high outputs?
goerch
April 7, 2022, 12:20pm
4
The results are true for what you are doing but not representative for what can be done with Julia.
Please see the first two sections in the Performance Tips .
1 Like
sum(rand(5))
is equivalent to sum(_ -> rand(), 1:5)
so your code is equivalent to the following code which uses no arrays:
sumrand(n) = sum(_ -> rand(), 1:n)
acl = sumrand(5) * (rand() - rand()) + sumrand(5) * (rand() - rand()) + sumrand(5) * (rand() - rand()) + sumrand(5) * (rand() - rand())
which is allocation-free:
julia> using BenchmarkTools
julia> @btime acl = sumrand(5) * (rand() - rand()) + sumrand(5) * (rand() - rand()) + sumrand(5) * (rand() - rand()) + sumrand(5) * (rand() - rand());
68.299 ns (0 allocations: 0 bytes)
You could even do
acl = sum(_ -> sumrand(5) * (rand() - rand()), 1:5)
This will show allocations, but only because it is in global scope ; in a function it will be allocation-free a well:
julia> f(n, m) = sum(_ -> sumrand(m) * (rand() - rand()), 1:n)
f (generic function with 1 method)
julia> @btime f(5,5);
102.041 ns (0 allocations: 0 bytes)
But maybe I misunderstood your question, and you are only asking about reducing the allocations in the acl = ...
line. In that case, the problem is simply that you are benchmarking using global variables as noted above.
xspeng
April 7, 2022, 12:52pm
7
Thanks for your detailed explanation, so if I understand correct, it’s better to put all the formula within a function, right?
1 Like
Yes. Read the Performance Tips in the manual, especially the first two tips.
1 Like