Matrix product `M * transpose(M)`

stla · May 15, 2021, 7:11pm

Hello,

Is there a function in Julia to efficiently compute the matrix product M * transpose(M)?

This matrix is symmetric, so there should be a more efficient way than doing M * transpose(M). In R one can use the tcrossprod function to perform this product.

Mason · May 15, 2021, 7:27pm

One thing you could do is

using Tullio, LoopVectorization
@tullio out[i, j] := M[i, k] * M[j, k]

This may not take advantage of symmetry, but it’ll take advantage of the fact that it’s just one matrix involved. Here’s how it compares to regular matrix multiplication:

using Tullio, LoopVectorization

f1(M) = M * transpose(M)
f2(M) = @tullio out[i, j] := M[i, k] * M[j, k]

foreach((10, 100, 1000)) do N
    M = rand(N, N)
    @show N
    @btime f1($M)
    @btime f2($M)
    println()
end

#+RESULTS:
N = 10
  535.363 ns (1 allocation: 896 bytes)
  197.315 ns (1 allocation: 896 bytes)

N = 100
  58.979 μs (2 allocations: 78.20 KiB)
  32.970 μs (50 allocations: 80.52 KiB)

N = 1000
  10.812 ms (2 allocations: 7.63 MiB)
  16.134 ms (76 allocations: 7.63 MiB)

As you can see, Tullio eventually starts losing to OpenBLAS for very large matrices, mostly because it doesn’t do array packing.

carstenbauer · May 15, 2021, 8:25pm

At the risk of mentioning the obvious: note that in Julia multiple dispatch automatically selects efficient methods of generic functions, like matrix multiply, based on the input types. Hence, by indicating that M is symmetric at the type level, i.e. making it a Symmetric, Julia will be smart about choosing an efficient multiply method:

julia> M = rand(100,100); M = (M+M')/2; # M is symmetric but of type Matrix{Float64}

julia> @which M * transpose(M) # dispatches to a generic implementation in matmul.jl
*(A::AbstractMatrix{T} where T, B::AbstractMatrix{T} where T) in LinearAlgebra at /Applications/Julia-1.6.app/Contents/Resources/julia/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:151

julia> M = Symmetric(M); # M is of type Symmetric{Float64, Matrix{Float64}}

julia> @which M * transpose(M) # dispatches to a specialised method in symmetric.jl
*(A::Union{Hermitian{T, S}, Symmetric{T, S}} where {T, S}, B::Union{Hermitian{T, S}, Symmetric{T, S}} where {T, S}) in LinearAlgebra at /Applications/Julia-1.6.app/Contents/Resources/julia/share/julia/stdlib/v1.6/LinearAlgebra/src/symmetric.jl:572

stla · May 15, 2021, 8:29pm

Thanks everyone.

@carstenbauer I don’t mean that M is symmetric. I mean that the product M * transpose(M) is symmetric (whatever M is). So in order to get this product it is enough to calculate the entries of the upper triangular part.

carstenbauer · May 15, 2021, 8:32pm

Oh, I misunderstood. Sorry for the noise

fredrikekre · May 15, 2021, 8:56pm

You are still correct, although the example you used where wrong. A * transpose(A) dispatches to syrk instead of gemm.

lhnguyen-vn · May 15, 2021, 9:01pm

Julia does this automatically under the hood by calling BLAS’s SYRK routine. You can examine the relevant code snippet in the standard library LinearAlgebra here.

rafael.guerra · May 16, 2021, 5:32pm

Sorry if it is a dumb question: why doesn’t Tullio load LoopVectorization as default?

mcabbott · May 16, 2021, 5:56pm

Mostly for startup & compilation times. Startup keeps getting better, but the time to generate & compile these fast versions is still significant:

julia> @time using Tullio, LoopVectorization
  1.325795 seconds (3.77 M allocations: 230.068 MiB, 2.90% gc time)

julia> M = rand(100,100); @time @tullio out[i, j] := M[i, k] * M[j, k];
  6.619863 seconds (19.40 M allocations: 1.078 GiB, 4.57% gc time, 0.00% compilation time)

vs

julia> @time using Tullio
  0.277620 seconds (766.49 k allocations: 46.810 MiB, 3.51% gc time)

julia> M = rand(100,100); @time @tullio out[i, j] := M[i, k] * M[j, k];
  0.133953 seconds (790.37 k allocations: 43.063 MiB, 3.37% gc time, 97.81% compilation time)

Euhan · December 8, 2022, 4:11pm

Hello fredrikekre, lhnguyen-vn

While I think it is great that this test is in place it doesn’t replace a presumptive dedicated tcrossprod (or whatever name the principle of least surprise dictates) or perhaps tcrossprod! for a few reasons:

You don’t get a Symmetric in return from A * transpose(A) and this can’t be handled automatically.
You can’t use a preallocated Symmetric with LinearAlgebra.mul!, thereby having to allocate about twice as much memory for the result. I suppose that this could potentially change because it would only require dispatching to another method, rather than causing type instability.
Such a function could make for clearer, shorter (depending on how long the name will be, of course) and easier-to-read code because one would write f(A) rather than f(A, g(A)).
The presence of this function would allow people to profit from time and memory optimisations you get with symmetric matrices without even having to think about it.

I think this functionality should be available or if it is in some way already I would be curious to know how to access it.

jishnub · December 8, 2022, 6:02pm

You can’t use a preallocated Symmetric with LinearAlgebra.mul!, thereby having to allocate about twice as much memory for the result

Perhaps mul!(S::Symmetric, A, transpose(A)) should be special-cased, instead of hitting a generic fallback?

Topic		Replies	Views
Product of two symmetric matrices: LoopVectorization.jl vs LinearAlgebra Performance blas , linearalgebra , loopvectorization	9	975	August 31, 2021
Speeding up Matrix multiplication involving dot and hadamard product New to Julia question , performance , vector , matrices , loopvectorization	11	1644	February 9, 2022
Efficient approach to multiply three matrices (M1M2M3) and two vectors and a matrix (xMy) Performance linearalgebra	18	5813	July 10, 2023
Faster quadratic expression for symmetric matrices Performance question , linearalgebra	12	796	December 27, 2022
Matrix multiplication on multiple dimensions General Usage matrices , tullio	8	3982	October 28, 2020

Matrix product `M * transpose(M)`

Related topics