Fastest way of contracting arrays

Good afternoon everyone!

I have the following piece of code, which essentially contracts some multidimensional arrays along specific dimensions (this is a no-sense example, which has the only purpose of illustrating clearly the situation):


d = 10
A=randn(d,d,d,d,d)
B=randn(d,d,d,d,d)

function funza()
acc = 0.0
for i = 1:1:d, j= 1:1:d, k= 1:1:d, l= 1:1:d, m = 1:1:d
acc += A[i,1,1,1,1]*A[l,1,1,1,1]*A[k,1,1,1,1]*A[j,1,1,1,1]*A[m,1,1,1,1]*B[m,l,k,j,i]
end
return acc
end

I get the following time of execution:


@time funza()
0.209131 seconds (2.12 M allocations: 61.713 MiB, 1.55% gc time)

But I’m 99% sure that there are ways much faster than this in order to perform the above contraction.

I thought about using TensorOperations.jl, but (as far as I can see) I don’t have the possibility to set explicitly some values for the indices (in the above no-sense example, all explicit indices have value 1): I obviously can’t compute the full tensor, which has d^(20) entries.
I also thought about using CUDA.jl, but I’m not sure that there are ready-made methods for this type of operations… am I wrong?

In any case, how can I improve the performance of this kind of contraction?

Thank you!

A, B, and D are currently globals. If you pass them into the function like

d = 10
A=randn(d,d,d,d,d)
B=randn(d,d,d,d,d)
function funza(A, B, d)
    acc = 0.0
    for i = 1:1:d, j= 1:1:d, k= 1:1:d, l= 1:1:d, m = 1:1:d
        acc += A[i,1,1,1,1]*A[l,1,1,1,1]*A[k,1,1,1,1]*A[j,1,1,1,1]*A[m,1,1,1,1]*B[m,l,k,j,i]
    end
    return acc
end
@time funza(A,B,d)

it takes 370μs

2 Likes

That’s astounding! Thank you very much.

Just out of curiosity: is there an “easy” way to do the same thing with the GPU?

Because it might be useful to use very high d, and in that case I would certainly have a boost using CUDA.jl …

julia> @btime funza($A,$B,$d)
  262.666 μs (0 allocations: 0 bytes)
51.78137833703912

julia> using Tullio

julia> @btime @tullio _ := $A[i,1,1,1,1]*$A[l,1,1,1,1]*$A[k,1,1,1,1]*$A[j,1,1,1,1]*$A[m,1,1,1,1]*$B[m,l,k,j,i]
  124.750 μs (1 allocation: 16 bytes)
51.781378337039484

julia> using LoopVectorization

julia> @btime @tullio _ := $A[i,1,1,1,1]*$A[l,1,1,1,1]*$A[k,1,1,1,1]*$A[j,1,1,1,1]*$A[m,1,1,1,1]*$B[m,l,k,j,i]
  31.541 μs (27 allocations: 720 bytes)
51.78137833704028

This example unfortunately won’t work on the GPU right now, scalar reductions are tricky. Something along these lines may work, though:

julia> using TensorOperations

julia> function funza_ten(A, B)
         v = A[:,1,1,1,1]
         @tensor out = v[i]*v[l]*v[k]*v[j]*v[m]*B[m,l,k,j,i]
       end
funza_ten (generic function with 1 method)

julia> @btime funza_ten($A, $B)
  247.417 μs (123 allocations: 8.36 KiB)
51.781378337036784
6 Likes

Please use triple-backtick fences to make your code legible.

2 Likes

ok!

How can I save the result of the computation by using the Tullio package?
That is, if I use the latter in the following way:

@btime @tullio result := A[i,1,1,1,1]*A[l,1,1,1,1]*A[k,1,1,1,1]*A[j,1,1,1,1]*A[m,1,1,1,1]*B[m,l,k,j,i]

then the variable “result” is not defined…

I don’t use Tullio but in general @btime x = ... will leave x undefined, so you’ll probably want to remove that.

4 Likes

Yeah, you definitely should remove @btime. It takes much longer, because it runs the code many times, and after that it throws away the result.

Probably not what you want.

@btime is just for benchmarking, not to be included in your actual code.

4 Likes