Multiplication of sub matrices

I’m sure this has been asked before but i can’t find the answer.

I have arrays
A{T}(n,n,p)
B{T}(n,n,p)

and I want to multiply them such that each of the n x n matrices are multiplied, so e.g.

X = Array{T}(undef,n,n,p)

for i=1:p
  X[:,:,i] .= A[:,:,i] * B[:,:,i]
end

So, two questions.
1 is the loop I’ve written the best way to do that ?
2 if it is, did I implement the loop efficiently ?

Thank you !

1 Like

You can shave off a little bit using LinearAlgebra.mul! with views

julia> using LinearAlgebra, BenchmarkTools

julia> function foo!(X, A, B)
           p = size(X,3)
           @assert p == size(A,3) == size(B,3)
           @inbounds for i in 1:p
               @views mul!(X[:,:,i], A[:,:,i], B[:,:,i])
           end
           X
       end
foo! (generic function with 1 method)

julia> function bar!(X, A, B)
           p = size(X,3)
           @assert p == size(A,3) == size(B,3)
           @inbounds for i=1:p
               X[:,:,i] .= A[:,:,i] * B[:,:,i]
           end
           X
       end
bar! (generic function with 1 method)

julia> n = 50; p = 500
500

julia> X = zeros(n,n,p); A = randn(n,n,p); B = randn(n,n,p);

julia> bar!(copy(X), A, B) == foo!(copy(X), A, B)
true

julia> @btime bar!($X, $A, $B);
  16.353 ms (3500 allocations: 28.80 MiB)

julia> @btime foo!($X, $A, $B);
  9.480 ms (1500 allocations: 93.75 KiB)
1 Like

thank you for that detailed response !

Maybe I should add some more informative comments as well.

The following statement can be a bit misleading sometimes:

X[:,:,i] .= A[:,:,i] * B[:,:,i] 

One might be tempted to think that this means “perform the matrix multiplication in-place at X[:,:,i]”, but that is not what is happening here. I call it misleading because often the expected “in-place” behaviour is what actually emerges, but that depends on the rest of the right hand side.

So then what is happening here? Well basically (if we simplify a little) you are writing a syntactic sugar for

broadcast!(identity, view(X,:,:,i), A[:,:,i] * B[:,:,i])

So in other words: you first index into A and create a temporary matrix of that slice, then the same thing for B. You then use those temporary matrices to compute the matrix multiplication into a new temporary result matrix. Then you more or less map the identity function over each element of the result matrix and store the resulting value into the corresponding position of X (though I am not sure if identity is maybe special-cased and avoided).
So a lot of allocation is going on.

In the example I gave you basically only allocate the views in each iteration of the loop. those are just shallow wrappers and avoid the copy-on-index behaviour outlined above. Then by using mul! we avoid the temporary result matrix.


Note, if you are interessted you can look at what is actually happening to the “dot” notation using the following command

julia> Meta.lower(Main, :(X[:,:,i] .= A[:,:,i] * B[:,:,i]))
:($(Expr(:thunk, CodeInfo(
 1 ─ %1 = (Base.getproperty)(Base.Broadcast, :materialize!)                                                                                               │
 │   %2 = (Base.dotview)(X, :, :, i)                                                                                                                      │
 │   %3 = (Base.getproperty)(Base.Broadcast, :broadcasted)                                                                                                │
 │   %4 = (Base.getindex)(A, :, :, i)                                                                                                                     │
 │   %5 = (Base.getindex)(B, :, :, i)                                                                                                                     │
 │   %6 = %4 * %5                                                                                                                                         │
 │   %7 = (%3)(Base.identity, %6)                                                                                                                         │
 │   %8 = (%1)(%2, %7)                                                                                                                                    │
 └──      return %8                                                                                                                                       │
))))
1 Like

Just a general comment but in my opinion it is good to be more careful about using @inbounds than here. Here, a view is being created, calls happen to BLAS etc, so the cost of the boundscheck is completely negligible. Mistakenly using the macro though and you can get silently corrupted results. Only add this macro when you have benchmarked with and without it and found that it made a difference.

It’s does seem surprising that Julia doesn’t do the expected thing here, i.e. destructively update X[:,:,i] directly instead of creating an intermediate value.

That is not the expected thing though, that’s magic :slight_smile:

The expected thing is for

X[:,:,i] .= A[:,:,i] * B[:,:,i]

to be the same as

tmp_A = A[:,:,i]
tmp_B = B[:,:,i]
tmp = tmp_A * tmp_B
X[:,:,i] .= tmp
1 Like

Sorry, I’m not following.

maybe the simpler question is why doesn’t

X[:,:,i] .= A[:,:,i] * B[:,:,i]

do a destructive update automatically ? I can’t seem to figure out why it doesn’t from the discussion.
Whatever the reason, this is why @Evizero points out the use of mul! to avoid the temporary allocation, right ?

I think that I’m misunderstanding the .= operator…

1 Like