Fastest matrix multiplication

foobar_lv2 · February 19, 2024, 2:51pm

Then write a method traceOutXYZSubSystem? After adapting all the memory layout (and basis choices!) to make the subsystems you actually want to trace out easy to trace out?

If you are fluent enough to understand kron as a shorthand for “trace out subsystem” with all its physical meaning, then by all means use that notation.

That is imo extremely advanced notation, because naive readers will come and take the kron and matmul literally, and end up assembling big matrices (possibly dense!) that they multiply, in lieu of a glorified memcpy.

A further modification I would suggest is the following:

##--initialization
sz = size(Gl,1) #I don't want to type expressions including Nf all the time
GlGa = hcat(Gl, Ga)
Vr = Vvi*Gr - I(sz)
Vl = Vvi*Gl

...
#tmp1 = similar(hcat(M*Gl, M*Ga))
#tmp2 = similar(Gl)

mul!(tmp1, M[:,:, ij], GlGa)
mul!(j0[:,:,ij],  Vl,  tmp1[:,1:sz])
mul!(tmp2, Vr, tmp[:, sz+1:2*sz])
j0[:,:,ij] .+= tmp2

In other words:

We can get better memory locality by using M[:,:,ij]*hcat(Gl,Ga) instead of doing two separate matmuls (as @StefanKarpinski remarked upthread)
We can hoist this hcat out of the loop
We can re-associate stuff like Vvi * (Gr * X) into (Vvi * Gr) * X and then hoist one matmul Vvi * Gr out of the loop.

PS. This brings it down from 1.6s → 1.2s on my machine.

@eveningsilverfox I think you will need to show us the sparsity structure of M[:,:, ij] if you want more help.

If Mi was dense, I think the remaining matmuls are still big enough for CUDA. I.e. I think with these sizes, it should amortize to send M[:,:,ij] from main memory to gpu memory, and fetch back the resulting j0[:,:, ij].

Topic		Replies	Views
How to improve the speed of matrix multiplication? Performance question	10	1885	January 14, 2021
Help me understand multi-threaded scaling for matrix multiplication Performance question	22	628	April 16, 2024
Matrix multiplication is slower when multithreading in Julia Performance question , multithreading , linearalgebra	13	4154	January 21, 2022
Pseudo-inverse of large matrix very slow New to Julia question , performance , linearalgebra , python	20	4227	January 12, 2022
Computational Speed General Usage question	20	2408	April 1, 2019

Fastest matrix multiplication

Related topics