Is there a way to build a static matrix programmatically?

Ronis_BR · July 23, 2023, 7:50pm

Hi!

I need to remove the runtime allocations for a project that is targeting embedding Julia code in a satellite computer. One big problem I have is to create static matrices programmatically. For example, in C++, we can use Eigen and static matrices (which usually are allocated in the stack) and we can change its coefficients to whatever we want given that the type is conserved.

I am trying to replicate this behavior in Julia using StaticArrays.jl and Accessors.jl. However, the code leads to a huge number of allocations and it is considerable slower than just allocating the matrix initially.

Here is one example of an algorithm that I need to use static matrices:

function fully_normalized_legendre(
    ϕ::T,
    n_max::Integer = -1,
    m_max::Integer = -1;
    ph_term::Bool = false
) where T<:Number
    n_max < 0 && throw(ArgumentError("n_max must be positive."))

    if (m_max < 0) || (m_max > n_max)
        m_max = n_max
    end

    P = zeros(float(T), n_max + 1, m_max + 1)

    # Auxiliary variables to improve code performance.
    s, c = sincos(T(ϕ))

    # The sine must be always positive. In fact, `s` was previously computed using
    # `√(1 - c^2)`. However, we had numerical problems for very small angles that lead to
    # `cos(ϕ) = 1`.
    if s < 0
        s = -s
    end

    s_fact = !ph_term ? +s : -s

    # Get the first indices in `P` to take into account offset arrays.
    i₀, j₀ = first.(axes(P))

    sq3 = √T(3)

    @inbounds for n in 0:n_max
        # Starting values.
        if n == 0
            P[i₀, j₀] = 1
            continue

        elseif n == 1
            P[i₀+1, j₀] = +sq3 * c

            if m_max > 0
                P[i₀+1, j₀+1] = +sq3 * s_fact
            end

            continue
        end

        aux_an = T(2n - 1) * T(2n + 1)
        aux_bn = T(2n + 1) / T(2n - 3)

        for m in 0:n

            if n == m
                P[i₀+n, j₀+n] = s_fact * √(T(2n + 1) / T(2n)) * P[i₀+n-1, j₀+n-1]

            else
                aux_nm = T(n - m) * T(n + m)
                a_nm   = √(aux_an / aux_nm) * c
                b_nm   = √(T(n + m - 1) * T(n - m - 1) * aux_bn / aux_nm)

                # We assume that the matrix is not initialized. Hence, we must not access
                # elements on the upper triangle.
                if m != n - 1
                    P[i₀+n, j₀+m] = a_nm * P[i₀+n-1, j₀+m] - b_nm * P[i₀+n-2, j₀+m]
                else
                    P[i₀+n, j₀+m] = a_nm * P[i₀+n-1, j₀+m]
                end
            end

            # Check if the maximum desired order has been reached.
            m == m_max && break
        end
    end

    return P
end

Can anyone help me?

stevengj · July 23, 2023, 8:03pm

The point of StaticArrays is not that they are “on the stack” (which is not necessarily true), it is that the sizes are known statically (i.e. at compile time), so that loops can be unrolled and the code can be specialized on the size. This fundamentally isn’t going to work if the matrix size is a runtime variable like your n_max and m_max.

If you try to use StaticArrays with runtime/dynamic sizes, it will be type unstable and (usually) slow.

Ronis_BR · July 23, 2023, 8:07pm

I see, thanks! However, in my application, I know the sizes during compilation time. In C++, it is a template. The problem is that if I use static arrays, I can allocate them, but not build them using a function unless I unroll the loops manually. This function will be called in different algorithms to build matrices with different sizes, but all known in compile time.

Is there anything I can do?

lmiq · July 23, 2023, 8:10pm

Have you tried using a MMatrix, enclosed in a function from where it does not escape (convert it to a SMatrix on return)?

Ronis_BR · July 23, 2023, 8:13pm

Nice suggestion! Let me try!

lmiq · July 23, 2023, 8:16pm

You still would have to provide the dimensions at compile time, with Val parameters, for example. But it seems that your P could be constructed as an MMatrix, and if you return it after conversion to a SMatrix it may not allocate.

Ronis_BR · July 23, 2023, 8:17pm

@lmiq This is awesome!! Thanks:

using StaticArrays

function fully_normalized_legendre(
    ϕ::T,
    ::Val{n_max},
    ::Val{m_max};
    ph_term::Bool = false
) where {T<:Number, n_max, m_max}
    P = @MMatrix zeros(float(T), n_max + 1, m_max + 1)

    # Auxiliary variables to improve code performance.
    s, c = sincos(T(ϕ))

    # The sine must be always positive. In fact, `s` was previously computed using
    # `√(1 - c^2)`. However, we had numerical problems for very small angles that lead to
    # `cos(ϕ) = 1`.
    if s < 0
        s = -s
    end

    s_fact = !ph_term ? +s : -s

    # Get the first indices in `P` to take into account offset arrays.
    i₀, j₀ = first.(axes(P))

    sq3 = √T(3)

    @inbounds for n in 0:n_max
        # Starting values.
        if n == 0
            P[i₀, j₀] = 1
            continue

        elseif n == 1
            P[i₀+1, j₀] = +sq3 * c

            if m_max > 0
                P[i₀+1, j₀+1] = +sq3 * s_fact
            end

            continue
        end

        aux_an = T(2n - 1) * T(2n + 1)
        aux_bn = T(2n + 1) / T(2n - 3)

        for m in 0:n

            if n == m
                P[i₀+n, j₀+n] = s_fact * √(T(2n + 1) / T(2n)) * P[i₀+n-1, j₀+n-1]

            else
                aux_nm = T(n - m) * T(n + m)
                a_nm   = √(aux_an / aux_nm) * c
                b_nm   = √(T(n + m - 1) * T(n - m - 1) * aux_bn / aux_nm)

                # We assume that the matrix is not initialized. Hence, we must not access
                # elements on the upper triangle.
                if m != n - 1
                    P[i₀+n, j₀+m] = a_nm * P[i₀+n-1, j₀+m] - b_nm * P[i₀+n-2, j₀+m]
                else
                    P[i₀+n, j₀+m] = a_nm * P[i₀+n-1, j₀+m]
                end
            end

            # Check if the maximum desired order has been reached.
            m == m_max && break
        end
    end

    return SMatrix(P)
end

julia> @btime fully_normalized_legendre(0.4, Val(10), Val(10))
  188.626 ns (0 allocations: 0 bytes)
11×11 SMatrix{11, 11, Float64, 121} with indices SOneTo(11)×SOneTo(11):
  1.0        0.0        0.0       0.0       0.0        0.0        0.0         0.0         0.0         0.0          0.0
  1.59532    0.674492   0.0       0.0       0.0        0.0        0.0         0.0         0.0         0.0          0.0
  1.72743    1.38915    0.293662  0.0       0.0        0.0        0.0         0.0         0.0         0.0          0.0
  1.51303    2.04533    0.715626  0.12352   0.0        0.0        0.0         0.0         0.0         0.0          0.0
  1.02713    2.4997     1.25595   0.341309  0.0510189  0.0        0.0         0.0         0.0         0.0          0.0
  0.365272   2.64916    1.83357   0.679559  0.155853   0.0208374  0.0         0.0         0.0         0.0          0.0
 -0.358181   2.44433    2.34728   1.12469   0.342713   0.0691996  0.00844581  0.0         0.0         0.0          0.0
 -1.02374    1.89616    2.69497   1.63525   0.625641   0.165205   0.0301283   0.00340439  0.0         0.0          0.0
 -1.52377    1.07384    2.79265   2.14629   1.00325    0.327013   0.0771633   0.0129286   0.00136653  0.0          0.0
 -1.77847    0.0936116  2.59076   2.57855   1.45478    0.568205   0.163665    0.0352094   0.00548637  0.000546735  0.0
 -1.74785   -0.899789   2.08504   2.85126   1.93947    0.893057   0.304478    0.0793263   0.0157795   0.00230768   0.000218166