Is there a way to implement the following function faster?
Seeds = rand(200,90,15000,10)
Mat = rand(200,200)
@views function testfunction(Seeds,Mat)
(J,T,N,S) = size(Seeds)
ν = zeros(Float32,J,T,N,S)
W[diagind(W)] .= 1
Ctemp = copy(C.L)
@inbounds for n=1:N, s=1:S, t=1:T
ν[:,t,n,s] = Ctemp*Seeds[:,t,n,s]
It usually takes around
92.284776 seconds (26.64 M allocations: 29.804 GiB, 3.92% gc time)
This should be faster on Julia 1.5/master since there the views won’t allocate.Changing your structure so that
J is you right-most index will also help, as it will make things go in memory order.
You’ll want to send that whole loop to BLAS. Reshape so that (t,n,s) is just one dimension, then just do
ν = C.L*Seeds (without need for preallocating or copying). Also you’re doing mixed float32 and float64, which BLAS might not like, so I’d advise you to choose one precision for the computational bottleneck and stick with it. Also profile to make sure that the bottleneck is indeed where you think it is.