Sub-arrays of static arrays

From a previous step in my computations, I have a static array and I would now like to perform an operation, say a QR decomposition, on a sub-array. This is performance critical code so I would like to squeeze as much speed out of it as possible—ideally being roughly the same speed as performing the operation directly on a static array the same size as the sub-array.

I’ve played around with a few ideas:

julia> A = @SMatrix rand(4, 4);

# Baseline comparison
julia> A_sub = SMatrix{3, 3}(A[1:3, 1:3]);

julia> 1e9 * @belapsed qr($A_sub)
20.519558676028083

# Views are not efficient
julia> 1e9 * @belapsed qr(@view ($A)[1:3, 1:3])
880.5961538461538

# Constructing a new SMatrix is better but still a bit slower than the target
julia> 1e9 * @belapsed qr(SMatrix{3, 3}($A[1:3, 1:3]))
50.49797160243408

# We can manually construct a new matrix and we then get the speed we desired. Can we automate this?
julia> 1e9 * @belapsed qr(SMatrix{3, 3}(($A)[1,1], ($A)[2,1], ($A)[3,1], 
                              ($A)[1,2], ($A)[2,2], ($A)[3,2],
                              ($A)[1,3], ($A)[2,3], ($A)[3,3]))
22.024072216649948

The last results seems to suggest what I’m after is possible, but I’m not sure how to generalise this. Any thoughts would be appreciated.

2 Likes

You could do

julia> A[SOneTo(3), SOneTo(3)]
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
 0.679889  0.422904   0.851232
 0.248234  0.805128   0.460355
 0.584804  0.0535163  0.903487

i.e.

qr(A[SOneTo(3), SOneTo(3)])
1 Like

And that’s also exactly what @view A[SOneTo(3), SOneTo(3)] returns, too. Since static arrays are immutable, there’s no meaningful distinction between views and getindex. You just need to give it static indices so it can know exactly how big the sub-array needs to be.

2 Likes

Ah fantastic, thank you both! That gives the target speed

1e9 * @belapsed qr(($A)[SOneTo(3), SOneTo(3)])
21.837349397590362

Does this only work for unit ranges starting from one? I can’t seem to find any variants for say i:j where i, j are known at compile time.

2 Likes

Yeah, there really should be a static range type. For now though you can just write

function srange(i,j;kwargs...) 
    r = range(i,j;kwargs...)
    SVector{length(r)}(r)
end

and then e.g.

qr(A[srange(2, 4), srange(1, 3)])
1 Like

This one seems a bit more robust with respect to performance:

 function srange_(start, step, len)
    t = ntuple(i->(i-1)*step + start, len)
    return SVector(t)
end

It seems fast in most cases, but if you really need it you can provide len as a Val. This also works for float ranges.

1 Like

There is the internal StaticArrays.SUnitRange(start, stop). It may have some pitfalls or missing features and (since it isn’t part of the interface) might someday break. But for now it works. And I would hope that eventually something like this gets made public interface.

2 Likes