Multiplication of vector of matrices and vector of vectors

To put numbers on that:

julia> @btime mult1!($c, $A, $x, $L);  # as above, my computer
  13.159 ms (800 allocations: 34.38 KiB)

julia> using NNlib  # to do a different computation!

julia> @btime batched_mul!(reshape($c,$L,1,$L), $A, reshape($x,$L,1,$L));
  1.985 ms (39 allocations: 3.23 KiB)

julia> @btime permutedims!($(similar(A)), $A, (2,3,1));  # cost of re-ordering
  13.308 ms (0 allocations: 0 bytes)

Here batched_mul is computing c[a,i] = A[a,b,i] * x[b,i] in which each matrix slice has stride 1. If you cannot re-organise your calculation to have this throughout, then the cost of permuting is quite high, and in this case another approach is:

julia> using Tullio

julia> mult3!(c, A, x, L=0) = @tullio c[i,a] = A[i,a,b] * x[i,b] avx=false;  # without LV

julia> @btime mult3!($c, $A, $x, $L);
  2.328 ms (49 allocations: 2.53 KiB)

julia> using LoopVectorization

julia> mult4!(c, A, x, L=0) = @tullio c[i,a] = A[i,a,b] * x[i,b];  # avx=true default LV loaded

julia> @btime mult4!($c, $A, $x, $L);
  1.565 ms (49 allocations: 2.53 KiB)
2 Likes