Help to improve performance of kalman filter


I am writing a Kalman filter code for some particular state space representation. With my current code, I still end up with a lot allocations, so I guess there are still a lot of potential efficiency gain. Could someone help for a newbie to improve? Thanks

using LinearAlgebra
using BenchmarkTools

mutable struct kalman{S <: AbstractFloat}

function forecast!(k::kalman{S}, Qt::Matrix{S}) where {S <: AbstractFloat}
    k.Vtt .= k.Vtt .+ Qt

    return nothing

function update!(k::kalman{S}, y::Vector{S}, X::Matrix{S}, Ht::Matrix{S}) where {S <: AbstractFloat}

    bp, Vp = k.btt, k.Vtt

    cfe = y - X * bp
    f = X * Vp * X' + Ht 
    invf = inv(f)
    k.btt = bp + Vp * X' * invf * cfe
    k.Vtt = Vp - Vp * X' * invf * X * Vp
    return nothing

function kf(y::AbstractMatrix{S}, Z::AbstractMatrix{S}, Ht::AbstractArray{S,3}, Qt::AbstractMatrix{S}, K::Int, N::Int, T::Int, b0::AbstractVector{S}, V0::AbstractMatrix{S}) where {S <: AbstractFloat}
    bt = Array{Float64}(undef, K, T)
    Vt = Array{Float64}(undef, K^2, T)

    k = kalman(b0, V0)
    for i = 1:T

        update!(k, y[:,i], Z[(i - 1) * N + 1:i * N,:], Ht[:,:,i])

        @inbounds bt[:,i] = k.btt
        @inbounds Vt[:,i] = vec(k.Vtt);

        forecast!(k, Qt)


    return bt, Vt

function test()
    T = 100
    N = 3
    K = 21
    y = randn(N, T)
    Z = randn(T * N, K)
    Ht = randn(N, N, T)
    Qt = randn(K, K)
    b0 = randn(K)
    V0 = randn(K, K)

    @benchmark a, b = kf($y, $Z, $Ht, $Qt, $K, $N, $T, $b0, $V0)


1 Like

This reference might be useful:, see the code down at the bottom.

First, did you read and work through the

in particular benchmarking, profiling, and allocation tracking?

Second, if you know it is allocations, note that lines like

k.btt = bp + Vp * X' * invf * cfe

allocate new matrices. Use k.btt .= ... (note the dot).

You can also pre-allocate for f.

Some style notes: you don’t need the type parameter S unless you are doing something with it. Why is Vt a vector in kf?

If your use case is really N = 3 and K = 21 (i.e., fairly small), I think using StaticArrays ( could be very beneficial. I’d change y to be a length-T Vector{SVector{N, Float64}}, and similar for Z and Ht (for clarity, and to make the largest and probably most variable size, T, dynamic). This allows you to write expressions like A + B * C' * inv(D) * E without allocating, as everything will be stack-allocated due to the statically known, fixed size of the arrays. In addition, you avoid any overhead of calling BLAS functions, which can be significant for these small matrices.