Help to improve performance of kalman filter

question

#1

Hi!

I am writing a Kalman filter code for some particular state space representation. With my current code, I still end up with a lot allocations, so I guess there are still a lot of potential efficiency gain. Could someone help for a newbie to improve? Thanks

using LinearAlgebra
using BenchmarkTools

mutable struct kalman{S <: AbstractFloat}
    btt::Vector{S}
    Vtt::Matrix{S}
end


function forecast!(k::kalman{S}, Qt::Matrix{S}) where {S <: AbstractFloat}
    k.Vtt .= k.Vtt .+ Qt

    return nothing
end

function update!(k::kalman{S}, y::Vector{S}, X::Matrix{S}, Ht::Matrix{S}) where {S <: AbstractFloat}

    bp, Vp = k.btt, k.Vtt

    cfe = y - X * bp
    f = X * Vp * X' + Ht 
    invf = inv(f)
    k.btt = bp + Vp * X' * invf * cfe
    k.Vtt = Vp - Vp * X' * invf * X * Vp
    
    return nothing
end


function kf(y::AbstractMatrix{S}, Z::AbstractMatrix{S}, Ht::AbstractArray{S,3}, Qt::AbstractMatrix{S}, K::Int, N::Int, T::Int, b0::AbstractVector{S}, V0::AbstractMatrix{S}) where {S <: AbstractFloat}
    bt = Array{Float64}(undef, K, T)
    Vt = Array{Float64}(undef, K^2, T)

    k = kalman(b0, V0)
    for i = 1:T

        update!(k, y[:,i], Z[(i - 1) * N + 1:i * N,:], Ht[:,:,i])

        @inbounds bt[:,i] = k.btt
        @inbounds Vt[:,i] = vec(k.Vtt);

        forecast!(k, Qt)

    end

    return bt, Vt
end

function test()
    T = 100
    N = 3
    K = 21
    y = randn(N, T)
    Z = randn(T * N, K)
    Ht = randn(N, N, T)
    Qt = randn(K, K)
    b0 = randn(K)
    V0 = randn(K, K)

    @benchmark a, b = kf($y, $Z, $Ht, $Qt, $K, $N, $T, $b0, $V0)

end

test()

#2

This reference might be useful: https://lectures.quantecon.org/jl/kalman.html, see the code down at the bottom.


#3

First, did you read and work through the

https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-tips-1

in particular benchmarking, profiling, and allocation tracking?

Second, if you know it is allocations, note that lines like

k.btt = bp + Vp * X' * invf * cfe

allocate new matrices. Use k.btt .= ... (note the dot).

You can also pre-allocate for f.

Some style notes: you don’t need the type parameter S unless you are doing something with it. Why is Vt a vector in kf?


#4

If your use case is really N = 3 and K = 21 (i.e., fairly small), I think using StaticArrays (https://github.com/JuliaArrays/StaticArrays.jl/) could be very beneficial. I’d change y to be a length-T Vector{SVector{N, Float64}}, and similar for Z and Ht (for clarity, and to make the largest and probably most variable size, T, dynamic). This allows you to write expressions like A + B * C' * inv(D) * E without allocating, as everything will be stack-allocated due to the statically known, fixed size of the arrays. In addition, you avoid any overhead of calling BLAS functions, which can be significant for these small matrices.