Matrix product `M * transpose(M)`

Hello,

Is there a function in Julia to efficiently compute the matrix product M * transpose(M)?

This matrix is symmetric, so there should be a more efficient way than doing M * transpose(M). In R one can use the tcrossprod function to perform this product.

One thing you could do is

using Tullio, LoopVectorization
@tullio out[i, j] := M[i, k] * M[j, k]

This may not take advantage of symmetry, but it’ll take advantage of the fact that it’s just one matrix involved. Here’s how it compares to regular matrix multiplication:

using Tullio, LoopVectorization

f1(M) = M * transpose(M)
f2(M) = @tullio out[i, j] := M[i, k] * M[j, k]

foreach((10, 100, 1000)) do N
    M = rand(N, N)
    @show N
    @btime f1($M)
    @btime f2($M)
    println()
end

#+RESULTS:
N = 10
  535.363 ns (1 allocation: 896 bytes)
  197.315 ns (1 allocation: 896 bytes)

N = 100
  58.979 ÎĽs (2 allocations: 78.20 KiB)
  32.970 ÎĽs (50 allocations: 80.52 KiB)

N = 1000
  10.812 ms (2 allocations: 7.63 MiB)
  16.134 ms (76 allocations: 7.63 MiB)

As you can see, Tullio eventually starts losing to OpenBLAS for very large matrices, mostly because it doesn’t do array packing.

3 Likes

At the risk of mentioning the obvious: note that in Julia multiple dispatch automatically selects efficient methods of generic functions, like matrix multiply, based on the input types. Hence, by indicating that M is symmetric at the type level, i.e. making it a Symmetric, Julia will be smart about choosing an efficient multiply method:

julia> M = rand(100,100); M = (M+M')/2; # M is symmetric but of type Matrix{Float64}

julia> @which M * transpose(M) # dispatches to a generic implementation in matmul.jl
*(A::AbstractMatrix{T} where T, B::AbstractMatrix{T} where T) in LinearAlgebra at /Applications/Julia-1.6.app/Contents/Resources/julia/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:151

julia> M = Symmetric(M); # M is of type Symmetric{Float64, Matrix{Float64}}

julia> @which M * transpose(M) # dispatches to a specialised method in symmetric.jl
*(A::Union{Hermitian{T, S}, Symmetric{T, S}} where {T, S}, B::Union{Hermitian{T, S}, Symmetric{T, S}} where {T, S}) in LinearAlgebra at /Applications/Julia-1.6.app/Contents/Resources/julia/share/julia/stdlib/v1.6/LinearAlgebra/src/symmetric.jl:572
1 Like

Thanks everyone.

@carstenbauer I don’t mean that M is symmetric. I mean that the product M * transpose(M) is symmetric (whatever M is). So in order to get this product it is enough to calculate the entries of the upper triangular part.

5 Likes

Oh, I misunderstood. Sorry for the noise :slight_smile:

1 Like

You are still correct, although the example you used where wrong. A * transpose(A) dispatches to syrk instead of gemm.

3 Likes

Julia does this automatically under the hood by calling BLAS’s SYRK routine. You can examine the relevant code snippet in the standard library LinearAlgebra here.

1 Like

Sorry if it is a dumb question: why doesn’t Tullio load LoopVectorization as default?

Mostly for startup & compilation times. Startup keeps getting better, but the time to generate & compile these fast versions is still significant:

julia> @time using Tullio, LoopVectorization
  1.325795 seconds (3.77 M allocations: 230.068 MiB, 2.90% gc time)

julia> M = rand(100,100); @time @tullio out[i, j] := M[i, k] * M[j, k];
  6.619863 seconds (19.40 M allocations: 1.078 GiB, 4.57% gc time, 0.00% compilation time)

vs

julia> @time using Tullio
  0.277620 seconds (766.49 k allocations: 46.810 MiB, 3.51% gc time)

julia> M = rand(100,100); @time @tullio out[i, j] := M[i, k] * M[j, k];
  0.133953 seconds (790.37 k allocations: 43.063 MiB, 3.37% gc time, 97.81% compilation time)
2 Likes

Hello fredrikekre, lhnguyen-vn

While I think it is great that this test is in place it doesn’t replace a presumptive dedicated tcrossprod (or whatever name the principle of least surprise dictates) or perhaps tcrossprod! for a few reasons:

  1. You don’t get a Symmetric in return from A * transpose(A) and this can’t be handled automatically.

  2. You can’t use a preallocated Symmetric with LinearAlgebra.mul!, thereby having to allocate about twice as much memory for the result. I suppose that this could potentially change because it would only require dispatching to another method, rather than causing type instability.

  3. Such a function could make for clearer, shorter (depending on how long the name will be, of course) and easier-to-read code because one would write f(A) rather than f(A, g(A)).

  4. The presence of this function would allow people to profit from time and memory optimisations you get with symmetric matrices without even having to think about it.

I think this functionality should be available or if it is in some way already I would be curious to know how to access it.

You can’t use a preallocated Symmetric with LinearAlgebra.mul!, thereby having to allocate about twice as much memory for the result

Perhaps mul!(S::Symmetric, A, transpose(A)) should be special-cased, instead of hitting a generic fallback?

2 Likes