Multiplication of sub matrices

purplishrock · September 7, 2018, 5:20pm

I’m sure this has been asked before but i can’t find the answer.

I have arrays
A{T}(n,n,p)
B{T}(n,n,p)

and I want to multiply them such that each of the n x n matrices are multiplied, so e.g.

X = Array{T}(undef,n,n,p)

for i=1:p
  X[:,:,i] .= A[:,:,i] * B[:,:,i]
end

So, two questions.
1 is the loop I’ve written the best way to do that ?
2 if it is, did I implement the loop efficiently ?

Thank you !

Evizero · September 7, 2018, 5:45pm

You can shave off a little bit using LinearAlgebra.mul! with views

julia> using LinearAlgebra, BenchmarkTools

julia> function foo!(X, A, B)
           p = size(X,3)
           @assert p == size(A,3) == size(B,3)
           @inbounds for i in 1:p
               @views mul!(X[:,:,i], A[:,:,i], B[:,:,i])
           end
           X
       end
foo! (generic function with 1 method)

julia> function bar!(X, A, B)
           p = size(X,3)
           @assert p == size(A,3) == size(B,3)
           @inbounds for i=1:p
               X[:,:,i] .= A[:,:,i] * B[:,:,i]
           end
           X
       end
bar! (generic function with 1 method)

julia> n = 50; p = 500
500

julia> X = zeros(n,n,p); A = randn(n,n,p); B = randn(n,n,p);

julia> bar!(copy(X), A, B) == foo!(copy(X), A, B)
true

julia> @btime bar!($X, $A, $B);
  16.353 ms (3500 allocations: 28.80 MiB)

julia> @btime foo!($X, $A, $B);
  9.480 ms (1500 allocations: 93.75 KiB)

purplishrock · September 7, 2018, 5:59pm

thank you for that detailed response !

Evizero · September 7, 2018, 6:03pm

Maybe I should add some more informative comments as well.

The following statement can be a bit misleading sometimes:

X[:,:,i] .= A[:,:,i] * B[:,:,i]

One might be tempted to think that this means “perform the matrix multiplication in-place at X[:,:,i]”, but that is not what is happening here. I call it misleading because often the expected “in-place” behaviour is what actually emerges, but that depends on the rest of the right hand side.

So then what is happening here? Well basically (if we simplify a little) you are writing a syntactic sugar for

broadcast!(identity, view(X,:,:,i), A[:,:,i] * B[:,:,i])

So in other words: you first index into A and create a temporary matrix of that slice, then the same thing for B. You then use those temporary matrices to compute the matrix multiplication into a new temporary result matrix. Then you more or less map the identity function over each element of the result matrix and store the resulting value into the corresponding position of X (though I am not sure if identity is maybe special-cased and avoided).
So a lot of allocation is going on.

In the example I gave you basically only allocate the views in each iteration of the loop. those are just shallow wrappers and avoid the copy-on-index behaviour outlined above. Then by using mul! we avoid the temporary result matrix.

Note, if you are interessted you can look at what is actually happening to the “dot” notation using the following command

julia> Meta.lower(Main, :(X[:,:,i] .= A[:,:,i] * B[:,:,i]))
:($(Expr(:thunk, CodeInfo(
 1 ─ %1 = (Base.getproperty)(Base.Broadcast, :materialize!)                                                                                               │
 │   %2 = (Base.dotview)(X, :, :, i)                                                                                                                      │
 │   %3 = (Base.getproperty)(Base.Broadcast, :broadcasted)                                                                                                │
 │   %4 = (Base.getindex)(A, :, :, i)                                                                                                                     │
 │   %5 = (Base.getindex)(B, :, :, i)                                                                                                                     │
 │   %6 = %4 * %5                                                                                                                                         │
 │   %7 = (%3)(Base.identity, %6)                                                                                                                         │
 │   %8 = (%1)(%2, %7)                                                                                                                                    │
 └──      return %8                                                                                                                                       │
))))

kristoffer.carlsson · September 7, 2018, 6:09pm

Just a general comment but in my opinion it is good to be more careful about using @inbounds than here. Here, a view is being created, calls happen to BLAS etc, so the cost of the boundscheck is completely negligible. Mistakenly using the macro though and you can get silently corrupted results. Only add this macro when you have benchmarked with and without it and found that it made a difference.

purplishrock · September 7, 2018, 7:47pm

It’s does seem surprising that Julia doesn’t do the expected thing here, i.e. destructively update X[:,:,i] directly instead of creating an intermediate value.

kristoffer.carlsson · September 7, 2018, 8:19pm

That is not the expected thing though, that’s magic

The expected thing is for

X[:,:,i] .= A[:,:,i] * B[:,:,i]

to be the same as

tmp_A = A[:,:,i]
tmp_B = B[:,:,i]
tmp = tmp_A * tmp_B
X[:,:,i] .= tmp

purplishrock · September 7, 2018, 8:53pm

Sorry, I’m not following.

maybe the simpler question is why doesn’t

X[:,:,i] .= A[:,:,i] * B[:,:,i]

do a destructive update automatically ? I can’t seem to figure out why it doesn’t from the discussion.
Whatever the reason, this is why @Evizero points out the use of mul! to avoid the temporary allocation, right ?

I think that I’m misunderstanding the .= operator…

Topic		Replies	Views
Inplace multiplication of sub-matrices without allocations Performance performance , linear-algebra , allocations	12	1757	December 12, 2022
Allocation when applying in place multiplication to matrix of SVector Performance question	2	192	March 15, 2023
Inplace multiplication by a square matrix General Usage	11	7512	February 17, 2017
In place GEMM Numerics linearalgebra	6	768	March 7, 2021
Multiplication of vector of matrices and vector of vectors Performance linearalgebra , tullio	3	418	November 7, 2022

Multiplication of sub matrices

Related topics