Hi,
First: I am very much looking forward to StaticCompiler.jl
on Windows, I will be putting it to good use!
I understand it’s early days, but I wanted to try out StaticTools.jl
to get a feel for the “static dialect”. The code
using StaticTools, BenchmarkTools,StaticArrays
function foo(v)
a = StackArray(ntuple(i->v,6), 6)
return sum(a)
end
function bar(v)
a = SVector{5,Float64}(v,v,v,v,v)
return a[1]+a[2]+a[3]+a[4]+a[5]
end
@btime foo(2)
@btime bar(2)
yields
10.210 ns (1 allocation: 64 bytes)
0.800 ns (0 allocations: 0 bytes)
and I feel cheated: creating a StackArray
allocates? On the heap?
What did I get wrong?
Your benchmark looks a bit odd: The lengths of the arrays are not equal and the way you sum them is also different.
But let’s start with the StaticArrays.jl variant bar
: The result looks like and indeed is due to constant-folding (sub 1ns runtime). See here:
julia> function bar(v)
a = SVector{5,Float64}(v,v,v,v,v)
return sum(a)
end
julia> @btime bar(2)
0.978 ns (0 allocations: 0 bytes)
10.0
julia> @btime bar($2) # interpolation avoids constant-folding
3.352 ns (0 allocations: 0 bytes)
10.0
Using sum
or direct array access to sum does not make a difference here. I am sure StaticArrays.jl took care that this call is fully inlined to the code that just sums the array entries.
Ok let’s now look at StackArray
(which I have no prior experience with).
There are no allocations, when I do the summation directly:
julia> function foo_direct(v)
a = StackArray(ntuple(i->v,5), 5) # changed size to 5 to match bar
return a[1]+a[2]+a[3]+a[4]+a[5]
end
julia> @btime foo_direct($2) # interpolation does not make a difference here
3.282 ns (0 allocations: 0 bytes)
10
However using sum
, the allocation appears:
julia> function foo_sum(v)
a = StackArray(ntuple(i->v,5), 5)
return sum(a)
end
foo_sum (generic function with 1 method)
julia> @btime foo_sum($2)
7.970 ns (1 allocation: 48 bytes)
10
Inspecting the @code_llvm
it seems that the code allocates an array, copies the values from the StackArray into it and then uses it to call a mapreduce
function. To me that looks like there is some specialized method missing to efficiently sum
a StackArray.
Oh thanks, I never thought of that. sum allocates - which actually makes sense.