Update my solutions here.
Shorter but not efficient enough solution. This solution aligns each 3D array layer in 2D array, then computes the result of 2D array multiplication and last converts back to 3D array.
using CUDA
A = CUDA.rand(4, 4, 3)
B = CUDA.rand(4, 4)
C = reshape(permutedims(A, [1, 3, 2]), size(A, 1) * size(A, 3), :) * B
C = permutedims(reshape(C, size(A, 1), size(A, 3), :), [1, 3, 2])
Longer but more efficient solution. Use for loop to get each entry of the final 3D array. This solution requires that the middle dimension of two arrays does not go too large (i.e. A (m,n,k), B(n, r), n is not very large).
using CUDA
A = CUDA.rand(4, 4, 3)
B = CUDA.rand(4, 4)
C = CUDA.zeros(4, 4, 3)
function kernel!(C, A, B)
i = threadIdx().x
j = threadIdx().y
k = threadIdx().z
if (i <= size(C, 1) && j <= size(C, 2) && k <= size(C, 3))
for ii in 1:size(C, 2)
@inbounds C[i, j, k] += A[i, ii, k] * B[ii, j]
end
end
return nothing
end