How to make faster in a matrix multiplication inside a loop?

Hi!

Is there a way to implement the following function faster?

Seeds = rand(200,90,15000,10)
Mat = rand(200,200)

@views  function testfunction(Seeds,Mat)
    
    (J,T,N,S) = size(Seeds)
    ν = zeros(Float32,J,T,N,S)
    W=0.75.*exp.((-0.003).*Mat)
    W[diagind(W)] .= 1
    C=cholesky(W)
    Ctemp = copy(C.L)
    @inbounds for n=1:N, s=1:S, t=1:T
        ν[:,t,n,s] = Ctemp*Seeds[:,t,n,s]
    end
    return  ν
end

It usually takes around

92.284776 seconds (26.64 M allocations: 29.804 GiB, 3.92% gc time)

This should be faster on Julia 1.5/master since there the views won’t allocate.Changing your structure so that J is you right-most index will also help, as it will make things go in memory order.

2 Likes

You’ll want to send that whole loop to BLAS. Reshape so that (t,n,s) is just one dimension, then just do ν = C.L*Seeds (without need for preallocating or copying). Also you’re doing mixed float32 and float64, which BLAS might not like, so I’d advise you to choose one precision for the computational bottleneck and stick with it. Also profile to make sure that the bottleneck is indeed where you think it is.

4 Likes