Efficient way of doing matmul along 3rd dimension

Hi,

I’ve been looking for a way to do matmul operations along slices of the 3rd dims of a 3D tensor.
I’ve had a look at Tullio.jl and TensorCast.jl but I’m not sure they are designed to do this. Basically I want to do this:

A = rand(5,5,10)
B = rand(5,5,10) 
C = zeros(5,5,10)
for k in 1:10
    C[:,:,k] = A[:,:,k] * B[:,:,k]
end

It’s like a broadcast but only along the 3rd dims. Ideally, i could just use mapslices but it only accepts one collection at a time.
This MWE works fine but I’m guessing there exists a package with functions or macros that do this more efficiently.

Some alternatives. Maybe I am not using Tullio quite properly here. I have 4 threads, and mul! is generally multi-threaded because of BLAS:

julia> using Tullio, LinearAlgebra

julia> function f!(C,A,B)
           for k in 1:10
               C[:,:,k] = A[:,:,k] * B[:,:,k]
           end
       end
f! (generic function with 1 method)

julia> function f2!(C,A,B)
           for k in 1:10
               @views mul!(C[:,:,k],A[:,:,k],B[:,:,k])
           end
       end
f2! (generic function with 1 method)

julia> f3!(C,A,B) = @views @tullio C[:,:,k] = A[:,:,k] * B[:,:,k]
f3! (generic function with 1 method)

julia> A = rand(5,5,10); B = rand(5,5,10); C = zeros(5,5,10);

julia> @btime f!($C,$A,$B);
  3.387 μs (30 allocations: 7.50 KiB)

julia> @btime f2!($C,$A,$B);
  1.750 μs (0 allocations: 0 bytes)

julia> @btime f3!($C,$A,$B);
  2.240 μs (10 allocations: 2.50 KiB)

(edit: added @views to Tullio).

Octavian gives me this:

julia> using Octavian

julia> function f5!(C,A,B)
           for k in 1:10
               @views Octavian.matmul!(C[:,:,k],A[:,:,k],B[:,:,k])
           end
       end
f5! (generic function with 1 method)

julia> @btime f5!($C,$A,$B);
  196.513 ns (0 allocations: 0 bytes)

to good to be true?

(the timings are probably very dependent on the matrix sizes, so you probably need to test with the actual data).

1 Like

eachslice(A; dims=3) .* eachslice(B; dims=3) gives almost exactly what you need, but unfortunately assigning to eachslice(C; dims=3) throws a weird error: ERROR: MethodError: no method matching ndims(::Type{Base.Generator{Base.OneTo{Int64}, Base.var"#230#231"{Array{Float64, 3}, Tuple{}, Tuple{Colon, Colon}}}}).

Okay thanks I’ll take a look at Octavian too.
Though I like Tullio also because I (think I) can do other stuff than matmul along the dimension.
Like

foo(A::Matrix, B::Matrix) = A*max.(B,0.0)
@views @tullio C[:,:,k] = foo(A[:,:,k], B[:,:,k])

Should work ?