I noticed compile time rapidly increases for large SizedArrays.
For example:
julia> using StaticArrays
julia> bench(a) = @time maximum(a)
bench (generic function with 1 method)
julia> bench(randn(1_000, 1_000))
0.000811 seconds
4.8615467274882045
julia> n = 10
10
julia> bench(Size(n,n)(randn(n, n)))
0.270037 seconds (241.64 k allocations: 12.865 MiB, 2.54% gc time)
2.3638881215887215
julia> n = 20
20
julia> bench(Size(n,n)(randn(n, n)))
3.662963 seconds (993.21 k allocations: 51.066 MiB, 0.58% gc time)
3.099424860659397
julia> bench(Size(n,n)(randn(n, n)))
0.000010 seconds (1 allocation: 16 bytes)
2.8169198602554775
I understand that each SizedArray of a new Size is a different type which requires further compilation. However, why does the compile time depend on the exact size of the array? Why isn’t it constant? Is there a way to make it constant?
StaticArrays has its own implementation of maximum which unrolls the loop over the elements of the array (using an @generated function). That results in an amount of code being generated which is proportional to the number of elements in the array. You can see the custom mapreduce implementation that is used under the hood here: https://github.com/JuliaArrays/StaticArrays.jl/blob/master/src/mapreduce.jl#L79-L91
This isn’t inherent to the approach.
If you just add a check on the length inside the generated function, you could return a loop over a certain size, and an unrolled expression below it. The matrix multiplication code does something like this (with MMatrix ultimately dispathching to BLAS): https://github.com/JuliaArrays/StaticArrays.jl/blob/master/src/matrix_multiply.jl#L95