Hi, for the following code

```
using LinearAlgebra
using BenchmarkTools
using TensorOperations
# Initialization
A = rand(100, 100)
B = rand(100, 100)
B_3d = rand(100, 100, 100)
n=100
C_new = zeros(100, 100, 100)
@btime @tensor C_new[a,c,d] := A[a,b] *B_3d[b,c,d]
@btime begin
for d = 1:n
C_new[:,:,d] = LinearAlgebra.BLAS.gemm('N', 'N', A, B_3d[:,:,d])
end
end
@btime begin
for d = 1:n
@tensor C_new[a,c,d] := A[a,b] *B_3d[b,c,d]
end
end
```

I got

```
6.801 ms (3 allocations: 7.63 MiB)
10.129 ms (701 allocations: 15.27 MiB)
1.030 s (401 allocations: 762.95 MiB)
```

what is the reason for the third timing, that I tried to load @tensor as

a matrix multiplication, A[a,b] *B_3d[b,c,d] ā for fixed d, is much slower than the first two?