In the Github of StaticArrays.jl, the following is stated:
Note that in the current implementation, working with large StaticArray s puts a lot of stress on the compiler, and becomes slower than Base.Array as the size increases. A very rough rule of thumb is that you should consider using a normal Array for arrays larger than 100 elements.
What is exactly meant here?
Is it in pseudo code;
A SVector{100,Float64} should be avoided
A Vector{SVector,N} where N = 100+ should be avoided?
I have not been able to find a time where I would ever do the first option, so I take it as option 2 being what is meant?
And in regards to “stress on the compiler”, is what is meant here that the garbage collection mechanism in Julia gets stressed or?
Imagine summing an array of 100 elements, you can do this with the loop for i = 1:100, or you can do this using x[1] + x[2] + x[3] + ....
Using standard arrays, many algorithms are implemented as loops, and those do not grow in compilation complexity with the number of elements in the array. StaticArrays often makes use of completely unrolled code, i.e., something closer to x[1] + x[2] + x[3] + .... The expression grows in compilation complexity with the length of the array, and for large arrays this ends up taking a long time to compile, and the code ends up being slower than the loop version. The compiled version of the loop is only a small number of assembly instructions, while the unrolled compiled code has a ton of instructions. All of those instructions might not fit into the instruction cache of the processor, making the code slow.
(Edited to explicitly state that the element type must be concrete for good performance, as pointed out by @DNF below)
To answer your question directly, it is your item 1. that should be avoided (or at least not exceeded). Large arrays with an element type of some concrete SVector are very performant and are a common idiom in Julia for representing, say, spatial coordinates.
@baggepinnen thank you for the detailed explanation. You made concepts such as unrolling clear to me and what it actually means now, thanks!
@PeterSimon thank you for confirming for me that I understood the above answer correctly, that it is item 1 and not item 2 as I initially thought.
I have experienced that using Vector{SVector} leads to some weird garbage collection behaviour / allocations while using 1D vectors to represent spatial coordinates, x, y and z does not. This is what pushed me to make this question to begin with. Is it correctly understood by me that there is no magic in StaticArrays, so if I do everything with for loops manually, then I should still see great performance?
You can get a lot of added benefit from using vectors of SVectors, that is a common use case for improved performance. If you are having strange performance issues, maybe you can share some of your code?
In particular, note that Vector{SVector} will be slow, since this is an abstract eltype, while eg. Vector{SVector{3, Float64}} should be efficient.
Is it correctly understood by me that there is no magic in StaticArrays, so if I do everything with for loops manually, then I should still see great performance?
This is correct. The reason SVector is fast are because
the compiler knows the length
the memory layout is optimal (e.g. Vector{SVector{3, Float64}} is just a single chunk of memory.
Basically this thread here covers an example of the issue. It seems to only pop up when running multiple iterations of a simulation, i.e. using BenchmarkTools all allocations are zero etc.
When I went away from using StaticArrays and only used 1D vectors, this issue with GC for updatexᵢⱼ! (in the link) would be gone - but not when using StaticArrays. So I am suspecting some issue with StaticArrays here - perhaps from my use.