Diagonal elements of matrix product

I have two matrices A and B, and I need to compute the diagonal elements of the product A*B as fast as possible, and store them in a pre-allocated vector.

What’s the fastest way to do this? I mean faster than writing my own loop (i.e., maybe hitting an appropriate BLAS routine, restructuring the input if needed).

You can use the eachrow and eachcol iterators to compute the dot products, which form the result’s diagonal.

using LinearAlgebra
A = rand(4, 4)
B = rand(4, 4)
v = similar(A, 4)

v .= dot.(eachrow(A), eachcol(B))

dot should use the corresponding strided BLAS routine.


Another possibility is:

tmp .= A .* B'
sum!(result, tmp)

where tmp and result are pre-allocated and of appropriate sizes

Some benchmarks.

julia> A = randn(10,10); B = randn(10, 10);

julia> tmp = randn(10, 10);

julia> function f1!(tmp, result, A, B)
       tmp .= A .* B'
       sum!(result, tmp)

julia> function f2!(result, A, B)
       result .= dot.(eachrow(A), eachcol(B))

julia> @btime f1!($tmp, $result, $A, $B);
  337.409 ns (0 allocations: 0 bytes)

julia> @btime f2!($result, $A, $B);
  784.857 ns (26 allocations: 1.34 KiB)

Are your matrices in your use case always only 10x10? For small sizes like this, your way is faster, but as you increase the size of your matrices, f2! should be a lot more efficient than f1!, as it will scale by O(n^2) instead of O(n^3).

Why O(n^3)? It looks like the exact same operations additions and multiplications to me, just in a different order (Note also the multiplication is broadcasted).