Hello,
I am trying to optimize some code and have some questions on behavior I am observing related to static arrays. I have read that if you have lots of small arrays (less than 100 elements) then static arrays are one option to increase performance of your code. Since they are stack allocated they are faster to access (I think).
I have some code where I need to create a 3-element vector in each iteration of a hot loop. Right now I have a function that looks like this (which has zero allocations)
function compute_coefficients!(cache_array, MyListofArgs...)
# Compute individual elements
a1 = ...
a2 = ...
a3 = ...
# Put into cache array
cache_array[1] = a1+a2
cache_array[2] = a2 - a3
cache_array[3] = a3*a2-a1
end
I thought this would be a good use case for a static arrays since it is a small array (maybe this is not a good use case, hoping someone can correct me if so). My initial impression was that since static arrays are stack allocated returning one should be like returning an Int
or a Float64
. So I tried to implement this instead
function compute_coefficients(MyListofArgs...)
# Compute individual elements
a1 = ...
a2 = ...
a3 = ...
# Put into SVector
tmp = SVector{3,Float64}(a1+a2, a2 - a3, a3*a2-a1)
return tmp
end
Now I have checked using @allocated
and the line starting with tmp = ...
inside compute_coefficients
does not allocate (which is consistent with my expectations). But when I call this function from one level up (from inside my loop) I do get an allocation.
function outer_func()
# Lots of code in here
# Inside a hot inner loop I am calling
tmp = compute_coeff(MyListofArgs...) # <-- @allocated says this line allocates
end
What I don’t understand is why in the outer function I am getting an allocation. I saw from this post that static arrays do not guarantee a stack allocation. But I don’t understand why there is no heap allocation inside compute_coeff
but there is when I call it from the outer function.
I have also tried adding type annotations to both the function output (i.e., compute_coefficients(MyListofArgs...)::SVector{3,Float64}
) and the tmp
in the outer function (i.e., tmp::SVector{3,Float64}
) but it did not make a difference.