I am looking for an efficient way to do this kind of operation (not that one, it’s an MWE):

julia> A = rand(3,3,5);
julia> B = rand(3,3,5);
julia> mult_sum(a::AbstractMatrix, b::AbstractMatrix) = sum(a*b, dims =1)
mult_sum (generic function with 1 method)
julia> mapslices(mult_sum, A, B, dims = 3)
ERROR: MethodError: no method matching mapslices(::typeof(mult_sum), ::Array{Float64, 3}, ::Array{Float64, 3}; dims=3)
Closest candidates are:
mapslices(::Any, ::AbstractArray; dims) at abstractarray.jl:2828
Stacktrace:
[1] top-level scope
@ REPL[31]:1

The issue is that mapslices only works with a single Array. I know I could work with eachslice and broadcast, but the result would be a Vector of Matrix. But I need the result to remain a 3D Array. Reducing with cat means reallocating and is therefore inefficient. Is there any package out there to work with this kind of stuff?

Further precision: the operation needs to be differentiable and to work with CUDA, that means no array indexing and no array mutation…

This example does an element-wise multiplication of the two arrays. What I’m looking for is a way to do matrix multiplications. For another example, say I have the same A and B matrices, I’d like some way of doing:

C = A * B

where C[:,:,k] = A[:,:,k] * B[:,:,k] and thus C would have dimensions 3x1x5.
It kind of looks like what Tullio does in one of your links but it looks like this package is made for Tensors (in the non-ML sense).