Compile-time for large SizedArray (StaticArrays.jl)

I noticed compile time rapidly increases for large SizedArrays.

For example:

julia> using StaticArrays

julia> bench(a) = @time maximum(a)
bench (generic function with 1 method)

julia> bench(randn(1_000, 1_000))
  0.000811 seconds
4.8615467274882045

julia> n = 10
10

julia> bench(Size(n,n)(randn(n, n)))
  0.270037 seconds (241.64 k allocations: 12.865 MiB, 2.54% gc time)
2.3638881215887215

julia> n = 20
20

julia> bench(Size(n,n)(randn(n, n)))
  3.662963 seconds (993.21 k allocations: 51.066 MiB, 0.58% gc time)
3.099424860659397

julia> bench(Size(n,n)(randn(n, n)))
  0.000010 seconds (1 allocation: 16 bytes)
2.8169198602554775

I understand that each SizedArray of a new Size is a different type which requires further compilation. However, why does the compile time depend on the exact size of the array? Why isn’t it constant? Is there a way to make it constant?

StaticArrays has its own implementation of maximum which unrolls the loop over the elements of the array (using an @generated function). That results in an amount of code being generated which is proportional to the number of elements in the array. You can see the custom mapreduce implementation that is used under the hood here: https://github.com/JuliaArrays/StaticArrays.jl/blob/master/src/mapreduce.jl#L79-L91

4 Likes

Ok, I assume this is also true for map, broadcast, etc? I guess that means, SizedArray really shouldn’t be used for more than 100-element arrays.

This isn’t inherent to the approach.
If you just add a check on the length inside the generated function, you could return a loop over a certain size, and an unrolled expression below it. The matrix multiplication code does something like this (with MMatrix ultimately dispathching to BLAS):
https://github.com/JuliaArrays/StaticArrays.jl/blob/master/src/matrix_multiply.jl#L95

2 Likes