Can this contraction run faster without memory allocation?

Hello everyone!

is it possible to make this code run faster and not allocate more memory than necessary?

julia> function funza(Tensor, indices)

       result = 0.0

       for i = 1:5

       T1 = Tensor[indices[1],indices[2],indices[3],indices[4],indices[5]]

       # if (T1 == 0) continue;  end

       for j = 1:5

       T2 = Tensor[indices[6],indices[7],indices[8],indices[9],indices[10]]

       # if (T2 == 0) continue;  end

       for k = 1:5

       T3 = Tensor[indices[11],indices[12],indices[13],indices[14],indices[15]]

       # if (T3 == 0) continue;  end

       for l = 1:5

       T4 = Tensor[indices[16],indices[17],indices[18],indices[19],indices[20]]

       # if (T4 == 0) continue;  end

       for m = 1:5

       T5 = Tensor[m,l,k,j,i]

       # if (T5 == 0) continue;  end

       result += T5*T4*T3*T2*T1

       end # m
       end # l
       end # k
       end # j
       end # i

       # returning result (irrelevant for this topic)

       end
funza (generic function with 3 methods)
julia> function main(N_iterations)

       indices = rand(1:5,20)

       for n=1:N_iterations

       # indices are supposed to change randomly between iterations 
       # (irrelevant for this topic)

       funza(Tensor, indices)

       # do something with the returning value of funza 
       # (omitted since irrelevant for this topic)

       end

       end
main (generic function with 1 method)

julia> @time main(10)
  0.000018 seconds (1 allocation: 240 bytes)

julia> @time main(10^6)
  1.442957 seconds (1 allocation: 240 bytes)

As I pointed out in the text, I have omitted some sections that are not relevant to this topic (so it goes without saying that the code as it is makes no sense) for the sake of clarity.

I have seen that there are a lot of packages for this type of contractions (Tullio, Einsum etc.) but it is essential that no additional memory is allocated when the indices are not hard constants, but are variable (not all summed!) as in the code .

I haven’t found a package with these features so far, but I’m confident that there is a way to speed this up (use GPU?)

Thanks a lot in advance everyone!