# Is there any way to optimize array additions and multiplications with transposes?

Yes. That I tried on the #17th reply, which appears in `test3(A) = 0.1 .* A .+ 0.2 .* PermutedDimsArray(A, (2,1,3,4))` in 14th reply. I would like to try the performance from `@tullio` and `@tensor`, but did not figure out an elegant way to do so. (`@tullio` crashed)

Oh right, sorry, that’s very similar.

Not sure exactly what the error is, but note that it’s not from Tullio, it’s from LoopVectorization. Tullio alone seems to run, but calling `+(x...)` with > 700 arguments is asking for problems – many things handling tuples stop being efficient at 32.

The whole thing seems a bit X-Y problem though. Why do you want to add all the permutations of a 6D-array? If you must, it’s linear, so you can just save the coefficients to disk and do it as one matrix-vector multiplication; at size `n=3` that’s 10^5 times faster.

2 Likes

I see. After removed `LoopVectorization`, `Tullio` takes forever…

I thought about defining a bigger array, take 4-dimensional case for example, `A[i,:,:,:,:] `. For each `i`, it corresponds to the `i`th permutation. Thus, the problem can be converted into
`A[i,:,:,:,:]* P[i] = B[:,:,:,:]` , where `P[i]` is the array of factors 0.1, 0.2, … corresponds to each permutation, as kind of matrix-vector multiplication. (sum over `i`)

I tried Fortran and Python. The thing is,the matrix-vector multiplication is faster, but building the bigger array takes longer time. In total, the time is similar to the direct summation over permutation. In `Julia`, I met some issues. Unable to do so. Maybe there is some trick to make it faster. I really want to see 10^5 faster

Here is my Python code

``````import numpy as np
import time
import itertools as it
import opt_einsum as oe

ref_list = [0, 1, 2, 3]
p = it.permutations(ref_list)
transpose_list = tuple(p)

n_loop = 2
na = nb = nc = nd = 30

A = np.random.random((na,nb,nc,nd))

factor_list = [(i+1)*0.1 for i in range(24)]
time_total = 0.0

for n in range(n_loop):
sum_A = np.zeros((na,nb,nc,nd))
start_0 = time.time()
for m, t in enumerate(transpose_list):
if abs(factor_list[m]) < 1.e-3:
continue
np.add(sum_A, factor_list[m]  * np.transpose(A, transpose_list[m] ), out = sum_A)

finish_0 = time.time()
time_total += finish_0 - start_0

n_factor = 24
total_array = np.zeros((n_factor,na,nb,nc,nd))
factor_array = np.asarray(factor_list)

time_total = 0.0
for n in range(n_loop):
start_0 = time.time()
for m in range(n_factor):
total_array[m,:,:,:,:] = np.transpose(A, transpose_list[m] )

factor_array = np.asarray(factor_list)

oe.contract('nijkl,n->ijkl', total_array, factor_array)
finish_0 = time.time()
time_total += finish_0 - start_0

print('time for einsum',time_total/n_loop)

``````

I got

``````time for permu addition 0.10860311985015869
time for einsum 0.14762508869171143
``````

Here is my Julia code

``````using BenchmarkTools, Tullio, TensorOperations, Combinatorics, LoopVectorization

function perms(a)
B = collect(permutations(a))
B
end

function einsum(n)
@tullio C[i,j,k,l] := factor_p[n]  * A[n, i, j, k, l]
end

thresh = 0.0001
P = perms([1,2,3,4])
factor_p = [0.1:0.1:2.4;]
n = 30
A = rand(n, n, n, n)

n_factor = 24
factor_p = [0.1:0.1:2.4;]

total_A = zeros(n_factor, n, n, n, n)
for i = 1:n_factor
@tullio total_A[i,j,k,l,m] = PermutedDimsArray(A, P[i])[j,k,l,m]
end

@btime einsum(n_factor)
``````

I got

``````ERROR: LoadError: "expected a 5-array A"
Stacktrace:
 macro expansion
@ ~/.julia/packages/Tullio/wAFFh/src/macro.jl:983 [inlined]
...
``````