Optimising code: Multiplying a list of matrices by a matrix

I changed your code to use static arrays, the timing of devectorized is printed in the bottom

using LinearAlgebra, BenchmarkTools, StaticArrays
# De-vectorised code
function devectorised!(mu, S, A, ATranspose, R, simulationLength, u)
    n = length(mu)
    for i = 1:simulationLength
        for j = 1:n
            mu[j] = A*mu[j] + u
            S[j] = A*S[j]*ATranspose + R
        end
    end
end

# Admin
d = 6
n = 1000
simulationLength = 30
# Motion model
T = 20e-3
A = [Array(1.0I, 3, 3) T*Array(1.0I, 3, 3); zeros(3, 3) Array(1.0I, 3, 3)]
A = SMatrix{size(A)...}(A)
ATranspose = A'
u = zeros(d)
u[3] = -0.5*(9.81)*(T^2)
u[6] = -(9.81)*(T)
u = SVector{length(u)}(u)
# Process noise
R = 4*I
# Time the functions
mu = [@SVector rand(d) for _ in 1:n]
S = [@SMatrix rand(d, d) for _ in 1:n]
nu = copy(mu)
P = copy(S)
@btime devectorised!(mu, S, A, ATranspose, R, simulationLength, u)
@btime (nuUpdated, PUpdated) = vectorised(nu, P, A, ATranspose, R, simulationLength)
# Check answers
maximumMeanError = maximum(abs.(mu[:] - nuUpdated[:]))
maximumCovarianceError = maximum(abs.(S[:] - PUpdated[:]))
println("The maximum mean error is ", maximumMeanError)
println("The maximum covariance error is ", maximumCovarianceError)

# Before 42.852 ms (254670 allocations: 53.78 MiB)
# After 2.023 ms (0 allocations: 0 bytes)
# Vectorized  13.915 ms (965 allocations: 44.31 MiB)

One big problem with your code was that the variable u was global, I added it as an argument. I changed all arrays to static versions and I changed R to the I::UniformScaling

3 Likes