Efficiently Calculate Matrices Defined Recursively

In my work I frequently have to calculate a long sequence (about N=10^6) of small matrices (about 3x3) that are defined recursively like X_t = f(X_{t-1}). The object of interest is the entire chain of matrices so I need to calculate and save each of them. I can usually write an efficient version of f() that can do all the calculations in place. A simplified version of this algorithm is below. My main question is what is the most efficient way to store the sequence of matrices X and to access each sub matrix X as I iterate through the chain. I’ve tried several different ways of laying out the X object but haven’t found a way that doesn’t lead to a bunch of allocations along in the for loop. Thanks.

using LinearAlgebra
function update!(X, A)
    for k in 2:100
        @views mul!(X[:,:,k], X[:,:,k - 1], A)
function run()
    X = ones(2, 2, 100)
    A = [0.5 0.5; 0.25 0.75]
    @time update!(X, A)

This sounds perfect for a vector of StaticArrays! If you need mutability you can use MArray‘s, but it will likely be faster by just doing out of place operations on SArray‘s.