How to do in-place matrix operation in Julia?

Hi, I’m new to Julia, and this is the first post to this forum.

I have problems regarding in-place matrix operations in julia.
I have to calculate matrix-matrix product in for-loop, and I tried to pre-allocate the memories, before the for-loop, as follows::

function test_1(N)
    a = rand(Float64, N, N)
    b = rand(Float64, N, N)
    c = Matrix{Float64}(undef, N, N)
    for i in 1:500
        mul!(c, a,b) ## inplace calculation of matrix matrix product
    end
end

I confirmed that this mitigates the memory allocation and increases efficiency, compared to the case where allocating memory in each step like below

function test_2(N)
    a = rand(Float64, N, N)
    b = rand(Float64, N, N)
    for i in 1:500
        c = a*b
    end
end

This was an easy task, because of the very simple matrix operation.
I want to generalize this into more complex matrix operations, like C = AB - BA.
How can I do this?

You might like Tullio.jl.

Do

C .= mul!(X1, A, B) .- mul!(X2, B, A)

where X1 and X2 are additional temporary arrays.

3 Likes

Thanks!. I tried Tullio.jl, and it indeed reduces the cost of memory allocation, but the matrix multiplication in this method was much slower than the conventional operation.
Anyway, I didn’t know this tool, and it might be helpful in other processes.
Thank you so much!

Or, with even fewer temporary arrays,

mul!(C, B, A)
mul!(C, A, B, 1.0, -1.0)
8 Likes

If you want Tullio to be faster you should add LoopVectorization.

1 Like

Thank you so much. I understand your suggestion. But in that case, we have to prepare several temporary arrays to calculate more complex matrix operations.
Is there any way to do that more easily and generally while keeping efficiency? (operation AB - BA was just an example, and I have to calculate more complex operations)

In general, if you’re doing large matrix multiplication, the allocations won’t matter (since they’re n^2 vs n^3). If you’re doing small matrix multiplication, you might want to check out StaticArrays which will be much faster for smaller than 10x10 or so (also Vector{SVector} can be really useful for something like 15000x3.

1 Like

yeah, that’s true. I should have given you more information. Basically, I have to calculate the time evolution of the Vector{Matrix} and, in each time step there is matrix multiplication, like below.

A = Vector{Matrix{ComplexF64}}
B = Vector{Matrix{ComplexF64}}

for t in time_step
   for i in eachindex(A)
        C[i] = A[i] * B[i]
   end
end

and the size of each element(Matrix) A is small (~4✕4), while the length of A is about 1000.
So I thought the cost of allocation might be comparable to that of matrix multiplication itself.

And thank you for suggesting me StaticArrays. I’ve never used it, so let me check it.

1 Like

Note that static arrays are most useful when the matrix sizes are known at compile time. If this is your use case, the package seems like a perfect fit.

4 Likes

Thank you very much. I confirmed that the matrix-matrix multiplication in SMatrix is much faster than the conventional one. However, my code is rather complicated and a little bit tricky to work with StaticArrays. Let me think about how to incorporate StaticArrays into my code. Thanks again!

1 Like

A while ago we worked on a package InplaceLinalg. It lets you write things like

@inplace C += 2π * A * B'

and this will be converted to the correct call gemm!().

I can’t recall if we ever published this package, but you can always install from github.

1 Like

There’s also GitHub - lopezm94/SugarBLAS.jl: Syntactic sugar for BLAS polynomials that does something similar

1 Like