Hi!
As I mentioned before, I am trying to build a full satellite attitude control subsystem (ACS) using Julia, which maybe one day will hopefully fly in a CubeSat. One very important requirement is to avoid runtime allocations.
The entire system is built using StaticArrays
from StaticArrays.jl and I manage to create a mockup of an entire ACS without allocations except for one single case: the Kalman filters.
We need to use at least two Kalman filters in our project: one for the attitude and one for the orbit. Those Kalman filters need to invert small matrices (6 x 6 and 3 x 3). The function inv
works fine and provide 0 allocation when using SMatrix
. However, for embedded systems that must work continuously, it is much safe to use the pseudo-inverse pinv
because the numerical errors can lead to singular matrices (we are using Float32
), especially when the KF has converged and the covariance is small.
The problem is that StaticArrays.jl just converts the StaticMatrix
to Matrix
and call the Base
function svd
, which allocates, to compute the singular value decomposition required to obtain the pseudo-inverse. I opened a ticket here:
To solve this problem, my first attempt was to rewrite BLAS’s svd
in Julia, which turns out to be an Herculean task. The second attempt was to use a naĂŻve algorithm, which is slow but given the lack of allocations it should be faster than the current approach. No dice, the code I wrote was slower than the current version for any matrix bigger than 2 x 2.
Finally, I had an insight! I saw sometime ago that Julia’s compiler got smarter enough to avoid allocations inside a function if you use MMatrix
s that do not exit the scope and then convert the result back to SMatrix
. Hence, I thought: “why don’t we convert the static matrix to the mutable version, call the BLAS function using ccall
, and then convert the result back to SMatrix
?” If everything is type stable, it should avoid all allocations!
I mentioned this in that thread, but the developers think this approach should not be used inside StaticArrays.jl, at least for now. Hence, I create a new package called StaticArraysBlasInterface.jl, which is not registered yet and can be seen here:
Basically, it overloads functions in StaticArrays.jl with some supported types that leads to a allocation-free computation.
Currently, I only overloaded the svd
function, which also provided an allocation-free pinv
:
julia> using LinearAlgebra, StaticArrays, BenchmarkTools
julia> A = @SMatrix randn(6, 6);
julia> @btime pinv($A)
3.542 ÎĽs (8 allocations: 5.30 KiB)
6Ă—6 SMatrix{6, 6, Float64, 36} with indices SOneTo(6)Ă—SOneTo(6):
-34.7034 -14.9662 -58.5796 57.7484 68.2193 120.044
-21.7112 -9.89541 -36.0827 36.773 42.9534 76.1427
-8.17912 -3.30241 -14.1337 13.6379 16.8462 28.4846
87.5626 38.9288 144.489 -147.519 -170.758 -302.945
-52.6933 -23.7337 -87.4295 89.1886 103.221 182.963
17.2491 7.57408 28.6682 -28.5039 -33.6592 -59.0681
julia> using StaticArraysBlasInterface
julia> @btime pinv($A)
2.893 ÎĽs (0 allocations: 0 bytes)
6Ă—6 SMatrix{6, 6, Float64, 36} with indices SOneTo(6)Ă—SOneTo(6):
-34.7034 -14.9662 -58.5796 57.7484 68.2193 120.044
-21.7112 -9.89541 -36.0827 36.773 42.9534 76.1427
-8.17912 -3.30241 -14.1337 13.6379 16.8462 28.4846
87.5626 38.9288 144.489 -147.519 -170.758 -302.945
-52.6933 -23.7337 -87.4295 89.1886 103.221 182.963
17.2491 7.57408 28.6682 -28.5039 -33.6592 -59.0681
There is some benchmarks here:
The results show a significant gain for small matrices (my use case) and an allocation-free algorithm that fits my requirements for the embedded software development. Notice that even if this algorithm was 2x slower, I would still need it because “no allocations” is a very hard requirement
If possible, I would like some feedback if this is a good approach, if it should be in a separate package or committed to StatciArrays.jl, and if there are more functions you want to see here.
Thanks!