How to speed up this Kronecker Multiplication?

The usual approach is to just use SparseArrays.jl. I don’t think that any further compiler optimization (i.e. StaticArrays.jl, type-level information) is going to help if you consider > 7 sites.

julia> using SparseArrays, LinearAlgebra, BenchmarkTools
julia> σx = sparse([0 1; 1.0 0])
2×2 SparseMatrixCSC{Float64, Int64} with 2 stored entries:
  ⋅   1.0
 1.0   ⋅ 

julia> σy = sparse([0 -im; im 0.0])
2×2 SparseMatrixCSC{ComplexF64, Int64} with 2 stored entries:
     ⋅      0.0-1.0im
 0.0+1.0im      ⋅    

julia> σz = sparse([1 0; 0 -1.0])
2×2 SparseMatrixCSC{Float64, Int64} with 2 stored entries:
 1.0    ⋅ 
  ⋅   -1.0

julia> const σvec = [σx, σy, σz] # note the const! It is needed for performance
3-element Vector{SparseMatrixCSC{Tv, Int64} where Tv}:
 sparse([2, 1], [1, 2], [1.0, 1.0], 2, 2)
 sparse([2, 1], [1, 2], ComplexF64[0.0 + 1.0im, 0.0 - 1.0im], 2, 2)
 sparse([1, 2], [1, 2], [1.0, -1.0], 2, 2)

julia> function σTensor(A)
           n = length(A) # why did you pass it as parameter?
           ret=σvec[A[1]]
           for i in 2:n
               ret=kron(ret,σvec[A[i]])
           end
           return ret
       end
julia> @btime σTensor([1,2,3], 3)
  539.524 ns (13 allocations: 1.08 KiB)
8×8 SparseMatrixCSC{ComplexF64, Int64} with 8 stored entries:
     ⋅           ⋅          ⋅      …       ⋅      0.0-1.0im       ⋅    
     ⋅           ⋅          ⋅              ⋅          ⋅      -0.0+1.0im
     ⋅           ⋅          ⋅              ⋅          ⋅           ⋅    
     ⋅           ⋅          ⋅         -0.0-1.0im      ⋅           ⋅    
     ⋅           ⋅      0.0-1.0im          ⋅          ⋅           ⋅    
     ⋅           ⋅          ⋅      …       ⋅          ⋅           ⋅    
 0.0+1.0im       ⋅          ⋅              ⋅          ⋅           ⋅    
     ⋅      -0.0-1.0im      ⋅              ⋅          ⋅           ⋅    

2 Likes