Speeding up Matrix multiplication involving dot and hadamard product

lmiq · February 9, 2022, 5:06pm

Another path is to try @tturbo on the complete loop. You may be able to improve on that by adjusting the order in which the indexes are run (although I think the macro already does that). In any case, at least here the single allocation observed is of course the out vector and then you could do this non-allocating easily. The performance is not as good as that of the other alternatives though.

julia> using LoopVectorization

julia> function naive_loop(S::Array{T,2},d::Vector{T},X::Array{T,2}) where T 
           out = similar(X) 
           @tturbo for i in 1:size(S,1) 
               for l in 1:size(X,2)
                  r = zero(T) 
                  for j in 1:size(S,1) 
                      for k in 1:size(S,1) 
                         r += S[i,j]*S[i,k]*S[j,k]*d[k]*X[j,l] 
                      end 
                  end 
                  out[i,l] += r
               end 
           end 
           out 
       end
naive_loop (generic function with 1 method)

julia> @btime naive_loop($S,$d,$X);
  243.603 μs (1 allocation: 7.94 KiB)

ps: Array{T,2} can be written as Matrix{T} (and usually it is better to use AbstractVector and AbstractMatrix such that the function accepts views, etc.

Topic		Replies	Views
Speed comparison matrix multiplication in Julia Performance question , linearalgebra , optimization , tullio	45	3259	August 19, 2021
Speed up simple product accumulator loop Performance loops , tullio	6	854	August 17, 2020
Why for-loop of vector multiplication is slower that dot product? General Usage question	11	455	August 11, 2022
Outperformed by Matlab Performance matlab , multithreading , linearalgebra , tullio , loopvectorization	54	4186	November 23, 2021
Matrix-vector multiplication slower than a 'naive' for loop? Performance vector	7	1650	July 30, 2020

Speeding up Matrix multiplication involving dot and hadamard product

Related topics