Efficient way of doing matmul along 3rd dimension

Some alternatives. Maybe I am not using Tullio quite properly here. I have 4 threads, and mul! is generally multi-threaded because of BLAS:

julia> using Tullio, LinearAlgebra

julia> function f!(C,A,B)
           for k in 1:10
               C[:,:,k] = A[:,:,k] * B[:,:,k]
           end
       end
f! (generic function with 1 method)

julia> function f2!(C,A,B)
           for k in 1:10
               @views mul!(C[:,:,k],A[:,:,k],B[:,:,k])
           end
       end
f2! (generic function with 1 method)

julia> f3!(C,A,B) = @views @tullio C[:,:,k] = A[:,:,k] * B[:,:,k]
f3! (generic function with 1 method)

julia> A = rand(5,5,10); B = rand(5,5,10); C = zeros(5,5,10);

julia> @btime f!($C,$A,$B);
  3.387 μs (30 allocations: 7.50 KiB)

julia> @btime f2!($C,$A,$B);
  1.750 μs (0 allocations: 0 bytes)

julia> @btime f3!($C,$A,$B);
  2.240 μs (10 allocations: 2.50 KiB)

(edit: added @views to Tullio).

Octavian gives me this:

julia> using Octavian

julia> function f5!(C,A,B)
           for k in 1:10
               @views Octavian.matmul!(C[:,:,k],A[:,:,k],B[:,:,k])
           end
       end
f5! (generic function with 1 method)

julia> @btime f5!($C,$A,$B);
  196.513 ns (0 allocations: 0 bytes)

to good to be true?

(the timings are probably very dependent on the matrix sizes, so you probably need to test with the actual data).

1 Like