Hi, I’m new to Julia, and this is the first post to this forum.

I have problems regarding in-place matrix operations in julia.
I have to calculate matrix-matrix product in for-loop, and I tried to pre-allocate the memories, before the for-loop, as follows::

function test_1(N)
a = rand(Float64, N, N)
b = rand(Float64, N, N)
c = Matrix{Float64}(undef, N, N)
for i in 1:500
mul!(c, a,b) ## inplace calculation of matrix matrix product
end
end

I confirmed that this mitigates the memory allocation and increases efficiency, compared to the case where allocating memory in each step like below

function test_2(N)
a = rand(Float64, N, N)
b = rand(Float64, N, N)
for i in 1:500
c = a*b
end
end

This was an easy task, because of the very simple matrix operation.
I want to generalize this into more complex matrix operations, like C = AB - BA.
How can I do this?

Thanks!. I tried Tullio.jl, and it indeed reduces the cost of memory allocation, but the matrix multiplication in this method was much slower than the conventional operation.
Anyway, I didn’t know this tool, and it might be helpful in other processes.
Thank you so much!

Thank you so much. I understand your suggestion. But in that case, we have to prepare several temporary arrays to calculate more complex matrix operations.
Is there any way to do that more easily and generally while keeping efficiency? (operation AB - BA was just an example, and I have to calculate more complex operations)

In general, if you’re doing large matrix multiplication, the allocations won’t matter (since they’re n^2 vs n^3). If you’re doing small matrix multiplication, you might want to check out StaticArrays which will be much faster for smaller than 10x10 or so (also Vector{SVector} can be really useful for something like 15000x3.

yeah, that’s true. I should have given you more information. Basically, I have to calculate the time evolution of the Vector{Matrix} and, in each time step there is matrix multiplication, like below.

A = Vector{Matrix{ComplexF64}}
B = Vector{Matrix{ComplexF64}}
for t in time_step
for i in eachindex(A)
C[i] = A[i] * B[i]
end
end

and the size of each element(Matrix) A is small (~4✕4), while the length of A is about 1000.
So I thought the cost of allocation might be comparable to that of matrix multiplication itself.

And thank you for suggesting me StaticArrays. I’ve never used it, so let me check it.

Note that static arrays are most useful when the matrix sizes are known at compile time. If this is your use case, the package seems like a perfect fit.

Thank you very much. I confirmed that the matrix-matrix multiplication in SMatrix is much faster than the conventional one. However, my code is rather complicated and a little bit tricky to work with StaticArrays. Let me think about how to incorporate StaticArrays into my code. Thanks again!