I tried to run a contraction ` A[a,b]*B[b,c,d] = C[a,c,d]`

, assume repeated index `b`

as summation using TensorKit. By the following code, it seems TensorKit is even slower than nested loops. Does TensorKit use BLAS? or is there any setting I am missing? Thank you very much. Here is my code

```
#import Pkg; Pkg.add("TensorKit")
#import Pkg; Pkg.add("TensorOperations")
using TensorKit
#using TensorOperations
# A[a,b]*B[b,c,d] = C[a,c,d]
na = nb = nc = nd = 12
A = rand(na, nb)
B = rand(nb, nc, nd)
C = zeros(na, nc, nd)
#println(A)
s = 0.0
@time begin
for d in 1:nd
for c in 1:nc
for a in 1:na
for b in 1:nb
global C
C[a,c,d] += A[a,b] * B[b,c,d]
end
end
end
end
end
print(C[1,2,3])
C = zeros(na, nc, nd)
@time begin
@tensor C[a,c,d] := A[a,b] * B[b,c,d]
end
print(C[1,2,3])
```

I got ` 0.040242 second`

for nested loops and `12.562485 seconds`

for `TensorKit `

.