Matrix product `M * transpose(M)`

Hello,

Is there a function in Julia to efficiently compute the matrix product M * transpose(M)?

This matrix is symmetric, so there should be a more efficient way than doing M * transpose(M). In R one can use the tcrossprod function to perform this product.

One thing you could do is

using Tullio, LoopVectorization
@tullio out[i, j] := M[i, k] * M[j, k]

This may not take advantage of symmetry, but it’ll take advantage of the fact that it’s just one matrix involved. Here’s how it compares to regular matrix multiplication:

using Tullio, LoopVectorization

f1(M) = M * transpose(M)
f2(M) = @tullio out[i, j] := M[i, k] * M[j, k]

foreach((10, 100, 1000)) do N
    M = rand(N, N)
    @show N
    @btime f1($M)
    @btime f2($M)
    println()
end

#+RESULTS:
N = 10
  535.363 ns (1 allocation: 896 bytes)
  197.315 ns (1 allocation: 896 bytes)

N = 100
  58.979 μs (2 allocations: 78.20 KiB)
  32.970 μs (50 allocations: 80.52 KiB)

N = 1000
  10.812 ms (2 allocations: 7.63 MiB)
  16.134 ms (76 allocations: 7.63 MiB)

As you can see, Tullio eventually starts losing to OpenBLAS for very large matrices, mostly because it doesn’t do array packing.

3 Likes

At the risk of mentioning the obvious: note that in Julia multiple dispatch automatically selects efficient methods of generic functions, like matrix multiply, based on the input types. Hence, by indicating that M is symmetric at the type level, i.e. making it a Symmetric, Julia will be smart about choosing an efficient multiply method:

julia> M = rand(100,100); M = (M+M')/2; # M is symmetric but of type Matrix{Float64}

julia> @which M * transpose(M) # dispatches to a generic implementation in matmul.jl
*(A::AbstractMatrix{T} where T, B::AbstractMatrix{T} where T) in LinearAlgebra at /Applications/Julia-1.6.app/Contents/Resources/julia/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:151

julia> M = Symmetric(M); # M is of type Symmetric{Float64, Matrix{Float64}}

julia> @which M * transpose(M) # dispatches to a specialised method in symmetric.jl
*(A::Union{Hermitian{T, S}, Symmetric{T, S}} where {T, S}, B::Union{Hermitian{T, S}, Symmetric{T, S}} where {T, S}) in LinearAlgebra at /Applications/Julia-1.6.app/Contents/Resources/julia/share/julia/stdlib/v1.6/LinearAlgebra/src/symmetric.jl:572
1 Like

Thanks everyone.

@carstenbauer I don’t mean that M is symmetric. I mean that the product M * transpose(M) is symmetric (whatever M is). So in order to get this product it is enough to calculate the entries of the upper triangular part.

4 Likes

Oh, I misunderstood. Sorry for the noise :slight_smile:

1 Like

You are still correct, although the example you used where wrong. A * transpose(A) dispatches to syrk instead of gemm.

2 Likes

Julia does this automatically under the hood by calling BLAS’s SYRK routine. You can examine the relevant code snippet in the standard library LinearAlgebra here.

Sorry if it is a dumb question: why doesn’t Tullio load LoopVectorization as default?

Mostly for startup & compilation times. Startup keeps getting better, but the time to generate & compile these fast versions is still significant:

julia> @time using Tullio, LoopVectorization
  1.325795 seconds (3.77 M allocations: 230.068 MiB, 2.90% gc time)

julia> M = rand(100,100); @time @tullio out[i, j] := M[i, k] * M[j, k];
  6.619863 seconds (19.40 M allocations: 1.078 GiB, 4.57% gc time, 0.00% compilation time)

vs

julia> @time using Tullio
  0.277620 seconds (766.49 k allocations: 46.810 MiB, 3.51% gc time)

julia> M = rand(100,100); @time @tullio out[i, j] := M[i, k] * M[j, k];
  0.133953 seconds (790.37 k allocations: 43.063 MiB, 3.37% gc time, 97.81% compilation time)
2 Likes