What is the best way to get a running matrix-product over a large 3D array

No it’s not. When you do sizeof(Y) it’s only counting the size of the array of pointers, not the size of the pointed-to memory. You can use Base.summarysize to get the total reachable size:

julia> N, M = 10, 1000;

julia> X = randn(N, N, M);

julia> Y = [randn(N, N) for _ in 1:M];

julia> sizeof(X), sizeof(Y)
(800000, 8000)

julia> Base.summarysize(X), Base.summarysize(Y)
(800056, 856040)

If all the arrays are the same size then a 3d array will take slightly less memory than an array of matrices (because the 3d array doesn’t have a separate object for each matrix). An array of matrices is more flexible in other ways, however — it is easier to change dynamically, e.g. to resize or to re-order, and it can also hold matrices of heterogeneous shapes.

It will be more efficient allocate the result matrix once and accumulate the running product in-place with LinearAlgebra.mul!, e.g.

using LinearAlgebra
function myprod(X::AbstractArray{T, 3}) where {T<:Number}
    P = Matrix{T}(I, size(X,1), size(X,2))
    N = LinearAlgebra.checksquare(P)
    A = similar(P) # temporary storage
    for i in axes(X, 3)
        @views mul!(A, P, X[:,:,i])
        P, A = A, P
    end
    return P
end

This allocates much less memory (though the time is only slightly improved if N is sufficiently large, since in that case the time is dominated by the O(MN^3) matrix multiplication cost).

For example, here is an in-place version of your running sum, with no allocations inside the loop:

julia> function store_inplace(N, G)
           X = Matrix{Float64}(I, N, N)
           A = similar(X)
           C = similar(X)
           for i in 1:G 
               C .= rand.()
               for col in eachslice(C, dims=1)
                   col ./= sum(col)
               end
               mul!(A, X, C)
               X, A = A, X
           end
           return X
       end
store_inplace (generic function with 1 method)

julia> @btime store_inplace(10, 1000000);
  543.341 ms (6 allocations: 2.77 KiB)

julia> @btime store_mem(10, 1000000);
  1.519 s (8000002 allocations: 3.53 GiB)

and it is about 3x faster than store_mem.

Whoops, I see that @Mason already commented on this. Note that the X, A = A, X saves a copy compared to X .= A, though the timing difference is negligible since the mul! cost dominates. (This is what I get for not reading the whole thread.)

5 Likes