# Multiplication of sub matrices

I’m sure this has been asked before but i can’t find the answer.

I have arrays
A{T}(n,n,p)
B{T}(n,n,p)

and I want to multiply them such that each of the n x n matrices are multiplied, so e.g.

``````X = Array{T}(undef,n,n,p)

for i=1:p
X[:,:,i] .= A[:,:,i] * B[:,:,i]
end
``````

So, two questions.
1 is the loop I’ve written the best way to do that ?
2 if it is, did I implement the loop efficiently ?

Thank you !

1 Like

You can shave off a little bit using `LinearAlgebra.mul!` with views

``````julia> using LinearAlgebra, BenchmarkTools

julia> function foo!(X, A, B)
p = size(X,3)
@assert p == size(A,3) == size(B,3)
@inbounds for i in 1:p
@views mul!(X[:,:,i], A[:,:,i], B[:,:,i])
end
X
end
foo! (generic function with 1 method)

julia> function bar!(X, A, B)
p = size(X,3)
@assert p == size(A,3) == size(B,3)
@inbounds for i=1:p
X[:,:,i] .= A[:,:,i] * B[:,:,i]
end
X
end
bar! (generic function with 1 method)

julia> n = 50; p = 500
500

julia> X = zeros(n,n,p); A = randn(n,n,p); B = randn(n,n,p);

julia> bar!(copy(X), A, B) == foo!(copy(X), A, B)
true

julia> @btime bar!(\$X, \$A, \$B);
16.353 ms (3500 allocations: 28.80 MiB)

julia> @btime foo!(\$X, \$A, \$B);
9.480 ms (1500 allocations: 93.75 KiB)
``````
1 Like

thank you for that detailed response !

The following statement can be a bit misleading sometimes:

``````X[:,:,i] .= A[:,:,i] * B[:,:,i]
``````

One might be tempted to think that this means "perform the matrix multiplication in-place at `X[:,:,i]`", but that is not what is happening here. I call it misleading because often the expected “in-place” behaviour is what actually emerges, but that depends on the rest of the right hand side.

So then what is happening here? Well basically (if we simplify a little) you are writing a syntactic sugar for

``````broadcast!(identity, view(X,:,:,i), A[:,:,i] * B[:,:,i])
``````

So in other words: you first index into A and create a temporary matrix of that slice, then the same thing for B. You then use those temporary matrices to compute the matrix multiplication into a new temporary result matrix. Then you more or less map the `identity` function over each element of the result matrix and store the resulting value into the corresponding position of `X` (though I am not sure if `identity` is maybe special-cased and avoided).
So a lot of allocation is going on.

In the example I gave you basically only allocate the views in each iteration of the loop. those are just shallow wrappers and avoid the copy-on-index behaviour outlined above. Then by using `mul!` we avoid the temporary result matrix.

Note, if you are interessted you can look at what is actually happening to the “dot” notation using the following command

``````julia> Meta.lower(Main, :(X[:,:,i] .= A[:,:,i] * B[:,:,i]))
:(\$(Expr(:thunk, CodeInfo(
1 ─ %1 = (Base.getproperty)(Base.Broadcast, :materialize!)                                                                                               │
│   %2 = (Base.dotview)(X, :, :, i)                                                                                                                      │
│   %4 = (Base.getindex)(A, :, :, i)                                                                                                                     │
│   %5 = (Base.getindex)(B, :, :, i)                                                                                                                     │
│   %6 = %4 * %5                                                                                                                                         │
│   %7 = (%3)(Base.identity, %6)                                                                                                                         │
│   %8 = (%1)(%2, %7)                                                                                                                                    │
└──      return %8                                                                                                                                       │
))))
``````
1 Like

Just a general comment but in my opinion it is good to be more careful about using `@inbounds` than here. Here, a view is being created, calls happen to BLAS etc, so the cost of the boundscheck is completely negligible. Mistakenly using the macro though and you can get silently corrupted results. Only add this macro when you have benchmarked with and without it and found that it made a difference.

It’s does seem surprising that Julia doesn’t do the expected thing here, i.e. destructively update X[:,:,i] directly instead of creating an intermediate value.

That is not the expected thing though, that’s magic

The expected thing is for

``````X[:,:,i] .= A[:,:,i] * B[:,:,i]
``````

to be the same as

``````tmp_A = A[:,:,i]
tmp_B = B[:,:,i]
tmp = tmp_A * tmp_B
X[:,:,i] .= tmp
``````
1 Like

Sorry, I’m not following.

maybe the simpler question is why doesn’t

``````X[:,:,i] .= A[:,:,i] * B[:,:,i]
``````

do a destructive update automatically ? I can’t seem to figure out why it doesn’t from the discussion.
Whatever the reason, this is why @Evizero points out the use of mul! to avoid the temporary allocation, right ?

I think that I’m misunderstanding the .= operator…

1 Like