# Diagonal elements of matrix product

I have two matrices `A` and `B`, and I need to compute the diagonal elements of the product `A*B` as fast as possible, and store them in a pre-allocated vector.

What’s the fastest way to do this? I mean faster than writing my own loop (i.e., maybe hitting an appropriate BLAS routine, restructuring the input if needed).

You can use the `eachrow` and `eachcol` iterators to compute the dot products, which form the result’s diagonal.

``````using LinearAlgebra
A = rand(4, 4)
B = rand(4, 4)
v = similar(A, 4)

v .= dot.(eachrow(A), eachcol(B))
``````

`dot` should use the corresponding strided BLAS routine.

3 Likes

Another possibility is:

``````tmp .= A .* B'
sum!(result, tmp)
``````

where `tmp` and `result` are pre-allocated and of appropriate sizes

Some benchmarks.

``````julia> A = randn(10,10); B = randn(10, 10);

julia> tmp = randn(10, 10);

julia> function f1!(tmp, result, A, B)
tmp .= A .* B'
sum!(result, tmp)
end

julia> function f2!(result, A, B)
result .= dot.(eachrow(A), eachcol(B))
end

julia> @btime f1!(\$tmp, \$result, \$A, \$B);
337.409 ns (0 allocations: 0 bytes)

julia> @btime f2!(\$result, \$A, \$B);
784.857 ns (26 allocations: 1.34 KiB)
``````

Are your matrices in your use case always only 10x10? For small sizes like this, your way is faster, but as you increase the size of your matrices, `f2!` should be a lot more efficient than `f1!`, as it will scale by O(n^2) instead of O(n^3).

Why O(n^3)? It looks like the exact same operations additions and multiplications to me, just in a different order (Note also the multiplication is broadcasted).