Hi!

As I mentioned before, I am trying to build a full satellite attitude control subsystem (ACS) using Julia, which maybe one day will hopefully fly in a CubeSat. One very important requirement is to avoid runtime allocations.

The entire system is built using `StaticArrays`

from StaticArrays.jl and I manage to create a mockup of an entire ACS without allocations except for one single case: the Kalman filters.

We need to use at least two Kalman filters in our project: one for the attitude and one for the orbit. Those Kalman filters need to invert small matrices (6 x 6 and 3 x 3). The function `inv`

works fine and provide 0 allocation when using `SMatrix`

. However, for embedded systems that must work continuously, it is much safe to use the pseudo-inverse `pinv`

because the numerical errors can lead to singular matrices (we are using `Float32`

), especially when the KF has converged and the covariance is small.

The problem is that StaticArrays.jl just converts the `StaticMatrix`

to `Matrix`

and call the `Base`

function `svd`

, which allocates, to compute the singular value decomposition required to obtain the pseudo-inverse. I opened a ticket here:

To solve this problem, my first attempt was to rewrite BLASâ€™s `svd`

in Julia, which turns out to be an Herculean task. The second attempt was to use a naĂŻve algorithm, which is slow but given the lack of allocations it should be faster than the current approach. No dice, the code I wrote was slower than the current version for any matrix bigger than 2 x 2.

Finally, I had an insight! I saw sometime ago that Juliaâ€™s compiler got smarter enough to avoid allocations inside a function if you use `MMatrix`

s that do not exit the scope and then convert the result back to `SMatrix`

. Hence, I thought: â€śwhy donâ€™t we convert the static matrix to the mutable version, call the BLAS function using `ccall`

, and then convert the result back to `SMatrix`

?â€ť If everything is type stable, it should avoid all allocations!

I mentioned this in that thread, but the developers think this approach should not be used inside StaticArrays.jl, at least for now. Hence, I create a new package called StaticArraysBlasInterface.jl, which is not registered yet and can be seen here:

Basically, it overloads functions in StaticArrays.jl with some supported types that leads to a allocation-free computation.

Currently, I only overloaded the `svd`

function, which also provided an allocation-free `pinv`

:

```
julia> using LinearAlgebra, StaticArrays, BenchmarkTools
julia> A = @SMatrix randn(6, 6);
julia> @btime pinv($A)
3.542 ÎĽs (8 allocations: 5.30 KiB)
6Ă—6 SMatrix{6, 6, Float64, 36} with indices SOneTo(6)Ă—SOneTo(6):
-34.7034 -14.9662 -58.5796 57.7484 68.2193 120.044
-21.7112 -9.89541 -36.0827 36.773 42.9534 76.1427
-8.17912 -3.30241 -14.1337 13.6379 16.8462 28.4846
87.5626 38.9288 144.489 -147.519 -170.758 -302.945
-52.6933 -23.7337 -87.4295 89.1886 103.221 182.963
17.2491 7.57408 28.6682 -28.5039 -33.6592 -59.0681
julia> using StaticArraysBlasInterface
julia> @btime pinv($A)
2.893 ÎĽs (0 allocations: 0 bytes)
6Ă—6 SMatrix{6, 6, Float64, 36} with indices SOneTo(6)Ă—SOneTo(6):
-34.7034 -14.9662 -58.5796 57.7484 68.2193 120.044
-21.7112 -9.89541 -36.0827 36.773 42.9534 76.1427
-8.17912 -3.30241 -14.1337 13.6379 16.8462 28.4846
87.5626 38.9288 144.489 -147.519 -170.758 -302.945
-52.6933 -23.7337 -87.4295 89.1886 103.221 182.963
17.2491 7.57408 28.6682 -28.5039 -33.6592 -59.0681
```

There is some benchmarks here:

The results show a significant gain for small matrices (my use case) and an allocation-free algorithm that fits my requirements for the embedded software development. Notice that even if this algorithm was 2x slower, I would still need it because â€śno allocationsâ€ť is a **very hard** requirement

If possible, I would like some feedback if this is a good approach, if it should be in a separate package or committed to StatciArrays.jl, and if there are more functions you want to see here.

Thanks!