Good afternoon everyone!
I have the following piece of code, which essentially contracts some multidimensional arrays along specific dimensions (this is a no-sense example, which has the only purpose of illustrating clearly the situation):
d = 10
A=randn(d,d,d,d,d)
B=randn(d,d,d,d,d)
function funza()
acc = 0.0
for i = 1:1:d, j= 1:1:d, k= 1:1:d, l= 1:1:d, m = 1:1:d
acc += A[i,1,1,1,1]*A[l,1,1,1,1]*A[k,1,1,1,1]*A[j,1,1,1,1]*A[m,1,1,1,1]*B[m,l,k,j,i]
end
return acc
end
I get the following time of execution:
@time funza()
0.209131 seconds (2.12 M allocations: 61.713 MiB, 1.55% gc time)
But I’m 99% sure that there are ways much faster than this in order to perform the above contraction.
I thought about using TensorOperations.jl, but (as far as I can see) I don’t have the possibility to set explicitly some values for the indices (in the above no-sense example, all explicit indices have value 1): I obviously can’t compute the full tensor, which has d^(20) entries.
I also thought about using CUDA.jl, but I’m not sure that there are ready-made methods for this type of operations… am I wrong?
In any case, how can I improve the performance of this kind of contraction?
Thank you!