Matrix product `M * transpose(M)`


Is there a function in Julia to efficiently compute the matrix product M * transpose(M)?

This matrix is symmetric, so there should be a more efficient way than doing M * transpose(M). In R one can use the tcrossprod function to perform this product.

One thing you could do is

using Tullio, LoopVectorization
@tullio out[i, j] := M[i, k] * M[j, k]

This may not take advantage of symmetry, but it’ll take advantage of the fact that it’s just one matrix involved. Here’s how it compares to regular matrix multiplication:

using Tullio, LoopVectorization

f1(M) = M * transpose(M)
f2(M) = @tullio out[i, j] := M[i, k] * M[j, k]

foreach((10, 100, 1000)) do N
    M = rand(N, N)
    @show N
    @btime f1($M)
    @btime f2($M)

N = 10
  535.363 ns (1 allocation: 896 bytes)
  197.315 ns (1 allocation: 896 bytes)

N = 100
  58.979 μs (2 allocations: 78.20 KiB)
  32.970 μs (50 allocations: 80.52 KiB)

N = 1000
  10.812 ms (2 allocations: 7.63 MiB)
  16.134 ms (76 allocations: 7.63 MiB)

As you can see, Tullio eventually starts losing to OpenBLAS for very large matrices, mostly because it doesn’t do array packing.


At the risk of mentioning the obvious: note that in Julia multiple dispatch automatically selects efficient methods of generic functions, like matrix multiply, based on the input types. Hence, by indicating that M is symmetric at the type level, i.e. making it a Symmetric, Julia will be smart about choosing an efficient multiply method:

julia> M = rand(100,100); M = (M+M')/2; # M is symmetric but of type Matrix{Float64}

julia> @which M * transpose(M) # dispatches to a generic implementation in matmul.jl
*(A::AbstractMatrix{T} where T, B::AbstractMatrix{T} where T) in LinearAlgebra at /Applications/

julia> M = Symmetric(M); # M is of type Symmetric{Float64, Matrix{Float64}}

julia> @which M * transpose(M) # dispatches to a specialised method in symmetric.jl
*(A::Union{Hermitian{T, S}, Symmetric{T, S}} where {T, S}, B::Union{Hermitian{T, S}, Symmetric{T, S}} where {T, S}) in LinearAlgebra at /Applications/
1 Like

Thanks everyone.

@carstenbauer I don’t mean that M is symmetric. I mean that the product M * transpose(M) is symmetric (whatever M is). So in order to get this product it is enough to calculate the entries of the upper triangular part.


Oh, I misunderstood. Sorry for the noise :slight_smile:

1 Like

You are still correct, although the example you used where wrong. A * transpose(A) dispatches to syrk instead of gemm.


Julia does this automatically under the hood by calling BLAS’s SYRK routine. You can examine the relevant code snippet in the standard library LinearAlgebra here.

Sorry if it is a dumb question: why doesn’t Tullio load LoopVectorization as default?

Mostly for startup & compilation times. Startup keeps getting better, but the time to generate & compile these fast versions is still significant:

julia> @time using Tullio, LoopVectorization
  1.325795 seconds (3.77 M allocations: 230.068 MiB, 2.90% gc time)

julia> M = rand(100,100); @time @tullio out[i, j] := M[i, k] * M[j, k];
  6.619863 seconds (19.40 M allocations: 1.078 GiB, 4.57% gc time, 0.00% compilation time)


julia> @time using Tullio
  0.277620 seconds (766.49 k allocations: 46.810 MiB, 3.51% gc time)

julia> M = rand(100,100); @time @tullio out[i, j] := M[i, k] * M[j, k];
  0.133953 seconds (790.37 k allocations: 43.063 MiB, 3.37% gc time, 97.81% compilation time)