# Fastest way of contracting arrays

Good afternoon everyone!

I have the following piece of code, which essentially contracts some multidimensional arrays along specific dimensions (this is a no-sense example, which has the only purpose of illustrating clearly the situation):

d = 10
A=randn(d,d,d,d,d)
B=randn(d,d,d,d,d)

## function funza() acc = 0.0 for i = 1:1:d, j= 1:1:d, k= 1:1:d, l= 1:1:d, m = 1:1:d acc += A[i,1,1,1,1]*A[l,1,1,1,1]*A[k,1,1,1,1]*A[j,1,1,1,1]*A[m,1,1,1,1]*B[m,l,k,j,i] end return acc end

I get the following time of execution:

## @time funza() 0.209131 seconds (2.12 M allocations: 61.713 MiB, 1.55% gc time)

But Iâ€™m 99% sure that there are ways much faster than this in order to perform the above contraction.

I thought about using TensorOperations.jl, but (as far as I can see) I donâ€™t have the possibility to set explicitly some values for the indices (in the above no-sense example, all explicit indices have value 1): I obviously canâ€™t compute the full tensor, which has d^(20) entries.
I also thought about using CUDA.jl, but Iâ€™m not sure that there are ready-made methods for this type of operationsâ€¦ am I wrong?

In any case, how can I improve the performance of this kind of contraction?

Thank you!

A, B, and D are currently globals. If you pass them into the function like

``````d = 10
A=randn(d,d,d,d,d)
B=randn(d,d,d,d,d)
function funza(A, B, d)
acc = 0.0
for i = 1:1:d, j= 1:1:d, k= 1:1:d, l= 1:1:d, m = 1:1:d
acc += A[i,1,1,1,1]*A[l,1,1,1,1]*A[k,1,1,1,1]*A[j,1,1,1,1]*A[m,1,1,1,1]*B[m,l,k,j,i]
end
return acc
end
@time funza(A,B,d)
``````

it takes 370ÎĽs

2 Likes

Thatâ€™s astounding! Thank you very much.

Just out of curiosity: is there an â€śeasyâ€ť way to do the same thing with the GPU?

Because it might be useful to use very high d, and in that case I would certainly have a boost using CUDA.jl â€¦

``````julia> @btime funza(\$A,\$B,\$d)
262.666 ÎĽs (0 allocations: 0 bytes)
51.78137833703912

julia> using Tullio

julia> @btime @tullio _ := \$A[i,1,1,1,1]*\$A[l,1,1,1,1]*\$A[k,1,1,1,1]*\$A[j,1,1,1,1]*\$A[m,1,1,1,1]*\$B[m,l,k,j,i]
124.750 ÎĽs (1 allocation: 16 bytes)
51.781378337039484

julia> using LoopVectorization

julia> @btime @tullio _ := \$A[i,1,1,1,1]*\$A[l,1,1,1,1]*\$A[k,1,1,1,1]*\$A[j,1,1,1,1]*\$A[m,1,1,1,1]*\$B[m,l,k,j,i]
31.541 ÎĽs (27 allocations: 720 bytes)
51.78137833704028
``````

This example unfortunately wonâ€™t work on the GPU right now, scalar reductions are tricky. Something along these lines may work, though:

``````julia> using TensorOperations

julia> function funza_ten(A, B)
v = A[:,1,1,1,1]
@tensor out = v[i]*v[l]*v[k]*v[j]*v[m]*B[m,l,k,j,i]
end
funza_ten (generic function with 1 method)

julia> @btime funza_ten(\$A, \$B)
247.417 ÎĽs (123 allocations: 8.36 KiB)
51.781378337036784
``````
6 Likes

2 Likes

ok!

How can I save the result of the computation by using the Tullio package?
That is, if I use the latter in the following way:

@btime @tullio result := A[i,1,1,1,1]*A[l,1,1,1,1]*A[k,1,1,1,1]*A[j,1,1,1,1]*A[m,1,1,1,1]*B[m,l,k,j,i]

then the variable â€śresultâ€ť is not definedâ€¦

I donâ€™t use Tullio but in general `@btime x = ...` will leave `x` undefined, so youâ€™ll probably want to remove that.

4 Likes

Yeah, you definitely should remove `@btime`. It takes much longer, because it runs the code many times, and after that it throws away the result.

Probably not what you want.

`@btime` is just for benchmarking, not to be included in your actual code.

4 Likes