About the additional memory allocation that appears in the OMEinsum package

julia> using OMEinsum

julia> a=rand(1024,1024);

julia> b=rand(1024,1024,2);

julia> using BenchmarkTools

julia> @btime ein"xy,yzw->xzw"(a,b);
  23.790 ms (77 allocations: 16.00 MiB)

julia> @btime ein"yx,yzw->xzw"(a,b);
  28.217 ms (101 allocations: 24.01 MiB)

As the example above shows, when I try to do this kind of index summation: ein"yx,yzw->xzw"(a,b), there’s always an extra memory allocation. This is equivalent to doing an index swap of the A matrix first using permutedims(), but I can always use permutedims!() to avoid additional memory allocation.

It seems that OMEinsum does not save this memory. When I actually use it, I get a lot of index summation operations, and I do a permutedims! every time I call ein operation can be tiring. My question is, does OMEinsum have any parameter settings that will save memory in this case?

Ideally this would reshape b and call a*b_mat or a'*b_mat, both of which are handled by BLAS without further copies. But it seems not to figure this out, and inserts permutedims(a). TensorOperations manages not to:

julia> using TensorOperations

julia> ein"xy,yzw->xzw"(a,b) ≈ @tensor c[x,z,w] := a[x,y] * b[y,z,w]

julia> @btime @tensor c[x,z,w] := a[x,y] * b[y,z,w];
  min 17.247 ms, mean 18.096 ms (3 allocations, 16.00 MiB)

julia> @btime @tensor c[x,z,w] := a[y,x] * b[y,z,w];
  min 16.403 ms, mean 17.084 ms (3 allocations, 16.00 MiB)

# transpose by hand -- probably misses BLAS?
julia> ein"yx,yzw->xzw"(a,b) ≈ @btime ein"xy,yzw->xzw"(a',b)
  min 681.191 ms, mean 681.278 ms (82 allocations, 16.03 MiB)

Okay,thank you very much!