using TensorKit
using BenchmarkTools
na = nb = nc = nd = 12
A = rand(na, nb)
B = rand(nb, nc, nd)
C = zeros(na, nc, nd)
function tensorkit_contraction(A,B,C)
@tensor C[a,c,d] := A[a,b] * B[b,c,d]
end
@btime tensorkit_contraction(A,B,C)

name as 1.jl. I tried to run julia 1.jl -t 1 and julia 1.jl -t 2 (julia 1.8.5)
I got 3.900 μs (1 allocation: 13.62 KiB) and 3.837 μs (1 allocation: 13.62 KiB)
Looks similar. Is there any simple way to parallel tensorkit?

I think your arguments here should be swapped around. It should be julia -t 1 1.jl instead. You can check by printing out Threads.nthreads().

Also, since these are linear algebra operations, it will likely use BLAS threads for a lot of the parallelism, which are separate from Julia threads. This can be set with

using LinearAlgebra;
BLAS.set_num_threads(1)

for example.

It’s hard to know which type of parallelism a library will use without looking at the docs/source. But you can experiment to find out by setting Julia threads and BLAS threads separately.

using TensorKit
using Tullio
using BenchmarkTools
na = nb = nc = nd = 12
A = rand(na, nb)
B = rand(nb, nc, nd)
C = zeros(na, nc, nd)
function tensorkit_contraction(A,B,C)
@tullio C[a,c,d] := A[a,b] * B[b,c,d]
end
@btime tensorkit_contraction(A,B,C)

but did not get speed up by julia -t 1 1.jl or julia -t 2 1.jl