# Matrix product `M * transpose(M)`

Hello,

Is there a function in Julia to efficiently compute the matrix product `M * transpose(M)`?

This matrix is symmetric, so there should be a more efficient way than doing `M * transpose(M)`. In R one can use the `tcrossprod` function to perform this product.

One thing you could do is

``````using Tullio, LoopVectorization
@tullio out[i, j] := M[i, k] * M[j, k]
``````

This may not take advantage of symmetry, but it’ll take advantage of the fact that it’s just one matrix involved. Here’s how it compares to regular matrix multiplication:

``````using Tullio, LoopVectorization

f1(M) = M * transpose(M)
f2(M) = @tullio out[i, j] := M[i, k] * M[j, k]

foreach((10, 100, 1000)) do N
M = rand(N, N)
@show N
@btime f1(\$M)
@btime f2(\$M)
println()
end

#+RESULTS:
N = 10
535.363 ns (1 allocation: 896 bytes)
197.315 ns (1 allocation: 896 bytes)

N = 100
58.979 μs (2 allocations: 78.20 KiB)
32.970 μs (50 allocations: 80.52 KiB)

N = 1000
10.812 ms (2 allocations: 7.63 MiB)
16.134 ms (76 allocations: 7.63 MiB)
``````

As you can see, Tullio eventually starts losing to OpenBLAS for very large matrices, mostly because it doesn’t do array packing.

3 Likes

At the risk of mentioning the obvious: note that in Julia multiple dispatch automatically selects efficient methods of generic functions, like matrix multiply, based on the input types. Hence, by indicating that `M` is symmetric at the type level, i.e. making it a `Symmetric`, Julia will be smart about choosing an efficient multiply method:

``````julia> M = rand(100,100); M = (M+M')/2; # M is symmetric but of type Matrix{Float64}

julia> @which M * transpose(M) # dispatches to a generic implementation in matmul.jl
*(A::AbstractMatrix{T} where T, B::AbstractMatrix{T} where T) in LinearAlgebra at /Applications/Julia-1.6.app/Contents/Resources/julia/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:151

julia> M = Symmetric(M); # M is of type Symmetric{Float64, Matrix{Float64}}

julia> @which M * transpose(M) # dispatches to a specialised method in symmetric.jl
*(A::Union{Hermitian{T, S}, Symmetric{T, S}} where {T, S}, B::Union{Hermitian{T, S}, Symmetric{T, S}} where {T, S}) in LinearAlgebra at /Applications/Julia-1.6.app/Contents/Resources/julia/share/julia/stdlib/v1.6/LinearAlgebra/src/symmetric.jl:572
``````
1 Like

Thanks everyone.

@carstenbauer I don’t mean that `M` is symmetric. I mean that the product `M * transpose(M)` is symmetric (whatever `M` is). So in order to get this product it is enough to calculate the entries of the upper triangular part.

4 Likes

Oh, I misunderstood. Sorry for the noise 1 Like

You are still correct, although the example you used where wrong. `A * transpose(A)` dispatches to `syrk` instead of `gemm`.

2 Likes

Julia does this automatically under the hood by calling BLAS’s SYRK routine. You can examine the relevant code snippet in the standard library `LinearAlgebra` here.

Sorry if it is a dumb question: why doesn’t Tullio load LoopVectorization as default?

Mostly for startup & compilation times. Startup keeps getting better, but the time to generate & compile these fast versions is still significant:

``````julia> @time using Tullio, LoopVectorization
1.325795 seconds (3.77 M allocations: 230.068 MiB, 2.90% gc time)

julia> M = rand(100,100); @time @tullio out[i, j] := M[i, k] * M[j, k];
6.619863 seconds (19.40 M allocations: 1.078 GiB, 4.57% gc time, 0.00% compilation time)
``````

vs

``````julia> @time using Tullio
0.277620 seconds (766.49 k allocations: 46.810 MiB, 3.51% gc time)

julia> M = rand(100,100); @time @tullio out[i, j] := M[i, k] * M[j, k];
0.133953 seconds (790.37 k allocations: 43.063 MiB, 3.37% gc time, 97.81% compilation time)
``````
2 Likes