Efficient matrix multiplication over 4-dimensional arrays

thbake · May 4, 2023, 1:40pm

I have two 4-dimensional arrays of sizes (r1, m, n, r2) and (s1, n, p, s2), respectively, and these have to be multiplied such that the resulting 4-dimensional array has shape (r1 * s1, m, p, r2 * s2). My naive implementation does the following:

function core_mult_col(lhs, rhs)

    r1, m, n, r2 = size(lhs)
    s1, n, p, s2 = size(rhs)

    dim_1 = r1 * s1
    dim_2 = m
    dim_3 = p
    dim_4 = r2 * s2

    # Pre-allocate memory for resulting core
    result = Array{Float64}(undef, (r1 * s1, m, p, r2 * s2))
 

    for j2 = 1:s2, i2 = 1:r2, j1 = 1:s1, i1 = 1:r1

        i = (i1-1) * s1 + j1
        j = (i2-1) * s2 + j2

        mul!(result[i, :, :, j], lhs[i1, :, :, i2] , rhs[j1, :, :, j2])

    end
end

However, I know that in this way I am making a lot of “jumps” in memory and therefore, I’m not being very efficient.

My other idea was to permute the dimensions of all the arrays such that the iteration occurs in contiguous blocks of memory:

function core_mult_col(lhs, rhs)

    r1, m, n, r2 = size(lhs)
    s1, n, p, s2 = size(rhs)

    dim_1 = r1 * s1
    dim_2 = m
    dim_3 = p
    dim_4 = r2 * s2

    permuted_lhs = Array{Float64}(undef, (r1, r2, m, n))
    permuted_rhs = Array{Float64}(undef, (s1, s2, n, p))

    permutedims!(permuted_lhs, lhs, (1, 4, 2, 3))
    permutedims!(permuted_rhs, rhs, (1, 4, 2, 3))

    pre_result = Array{Float64}(undef, (r1 * s1, r2 * s2, m, p))

    # here I would like to iterate over the first two dimensions of 
    # each array efficiently and multiply the slices permuted_lhs[i, j, :, :], permuted_rhs[i, j, :, :]

    result  = Array{Float64}(undef, (r1 * s1, m, p, r2 * s2)

    result = permutedims!(pre_result, (1, 3, 4, 2))

    return result

end

However, I’m not sure how to do this iteration efficiently. Is there a way to do this in Julia specifically?

PS: I already used the TensorOperations.jl package, but the performance was slower in comparison to the implementation that I had in Python.

Thanks in advance.

Oscar_Smith · May 4, 2023, 1:55pm

How fast is the Tullio version of this?

using Tullio, LoopVectorization
function f(A,B)
    @tullio C[r1, s1, m, p, r2, s2] := A[r1, m, n, r2] * B[s1, n, p, s2]
    return reshape(C, r1*s1, m, p, r2*s2)
end

thbake · May 4, 2023, 2:09pm

Thanks for the reply! It also seems to be slower by a few 100 ms than the TensorOperations version.

Mason · May 4, 2023, 2:13pm

Did you do using LoopVectorization as well? If LV is loaded, Tullio can generate faster code

mcabbott · May 4, 2023, 2:13pm

Can you check that the code above works? I get parsing errors, and then bounds errors. And mul!(result[i, :, :, j] makes a copy & writes into that, it wants @views in front to write into result.

For many sizes, Tullio + LoopVectorization is likely to be faster than permutedims + matmul. Which is what TensorOperations will re-write this to be.

thbake · May 4, 2023, 2:28pm

Yes, I am using it in the first cell of my Pluto notebook. I’m not sure whether it makes a difference where I put it.

thbake · May 4, 2023, 2:30pm

You are correct, I had a typo when allocating memory for result. It was

result = Array{Float64}(undef, (r1 * s1, m, p , r2 * s2))

instead of

result = Array{Float64}(undef, (r1, s1, m, p , r2, s2))

I already corrected it in the original post.

mcabbott · May 4, 2023, 2:41pm

I think that still needs @views. It does not agree with the results from the index packages (using what Oscar wrote), but I haven’t tried to track down why. Timing & comparing:

julia> using TensorOperations, Tullio, LoopVectorization

julia> function f_tensor(A,B)
           @tensor C[r1, s1, m, p, r2, s2] := A[r1, m, n, r2] * B[s1, n, p, s2]
           r1, m, n, r2 = size(A)
           s1, n, p, s2 = size(B)
           return reshape(C, r1*s1, m, p, r2*s2)
       end
f_tensor (generic function with 1 method)

julia> function f_tullio(A,B)
           @tullio C[r1, s1, m, p, r2, s2] := A[r1, m, n, r2] * B[s1, n, p, s2]
           r1, m, n, r2 = size(A)
           s1, n, p, s2 = size(B)
           return reshape(C, r1*s1, m, p, r2*s2)
       end
f_tullio (generic function with 1 method)

julia> let lhs = rand(2,3,4,5), rhs = rand(3,4,5,2)
        y1 = @btime core_mult_col_views($lhs, $rhs)  # with @views, and return result
        y2 = @btime f_tensor($lhs, $rhs)
        y3 = @btime f_tullio($lhs, $rhs)
        y1 ≈ y2, y2 ≈ y3
       end
  min 57.208 μs, mean 218.863 μs (722 allocations, 1.24 MiB)
  min 16.042 μs, mean 19.183 μs (102 allocations, 14.33 KiB)
  min 1.217 μs, mean 2.493 μs (15 allocations, 7.83 KiB)
(false, true)

Topic		Replies	Views
Multiplication of sub matrices General Usage	7	645	September 7, 2018
Multiplying a 3d array by a scalar? General Usage	6	1066	September 28, 2020
Mul! with 3-dimensional array New to Julia linearalgebra	3	189	June 6, 2024
Help with array multiplication New to Julia	1	343	October 7, 2019
Matrix multiplication on multiple dimensions General Usage matrices , tullio	8	3776	October 28, 2020

Efficient matrix multiplication over 4-dimensional arrays

Related topics