From a previous step in my computations, I have a static array and I would now like to perform an operation, say a QR decomposition, on a sub-array. This is performance critical code so I would like to squeeze as much speed out of it as possible—ideally being roughly the same speed as performing the operation directly on a static array the same size as the sub-array.
I’ve played around with a few ideas:
julia> A = @SMatrix rand(4, 4);
# Baseline comparison
julia> A_sub = SMatrix{3, 3}(A[1:3, 1:3]);
julia> 1e9 * @belapsed qr($A_sub)
20.519558676028083
# Views are not efficient
julia> 1e9 * @belapsed qr(@view ($A)[1:3, 1:3])
880.5961538461538
# Constructing a new SMatrix is better but still a bit slower than the target
julia> 1e9 * @belapsed qr(SMatrix{3, 3}($A[1:3, 1:3]))
50.49797160243408
# We can manually construct a new matrix and we then get the speed we desired. Can we automate this?
julia> 1e9 * @belapsed qr(SMatrix{3, 3}(($A)[1,1], ($A)[2,1], ($A)[3,1],
($A)[1,2], ($A)[2,2], ($A)[3,2],
($A)[1,3], ($A)[2,3], ($A)[3,3]))
22.024072216649948
The last results seems to suggest what I’m after is possible, but I’m not sure how to generalise this. Any thoughts would be appreciated.
And that’s also exactly what @view A[SOneTo(3), SOneTo(3)] returns, too. Since static arrays are immutable, there’s no meaningful distinction between views and getindex. You just need to give it static indices so it can know exactly how big the sub-array needs to be.
There is the internal StaticArrays.SUnitRange(start, stop). It may have some pitfalls or missing features and (since it isn’t part of the interface) might someday break. But for now it works. And I would hope that eventually something like this gets made public interface.