The issues of frequent recompilation and low computational efficiency in ITensor.jl

I’m a beginner in Julia and I’m using ITensor.jl for computations in my project. However, I’ve noticed that the program runs with very low efficiency, and both CPU and memory usage are minimal. I’m not sure how to optimize it to address this issue.

using ITensors, LinearAlgebra, DelimitedFiles, Plots
using Random, Combinatorics, Distributions

function sample_pairs(rng::AbstractRNG, N::Int, m::Int, alpha::Float64)
    all_pairs = collect(combinations(1:N, 2))
    dists = [min(j-i, N-(j-i)) for (i,j) in all_pairs]
    weights = alpha == 0.0 ? ones(length(dists)) : [r^(-alpha) for r in dists]
    probs = weights ./ sum(weights)
    cat = Categorical(probs)
    idxs = rand(rng, cat, m)
    return [all_pairs[k] for k in idxs]
end

ITensors.space(::SiteType"Replica") = 6
uq = readdlm("weingarten_6_6_6_6.csv", ',')
function ITensors.op(::OpName"Transfer", ::SiteType"Replica", s1::Index, s2::Index)
    itensor(Float64, reduce(vcat, uq), s2', s1', s2, s1)
end

ITensors.state(::StateName"FUp",   ::SiteType"Replica") = [1.,1.,1.,0.,0.,1.]
ITensors.state(::StateName"FDown", ::SiteType"Replica") = [1.,0.,0.,1.,1.,1.]

N = 16
NA = 4
tmax = 15

alphas = [0.0,]
thetas = [i * π/10 for i in 2:5]

num_samples = 50

cutoff = 1e-8
maxdim = 100

rng = MersenneTwister(10086)

sites = siteinds("Replica", N)
results = []

for alpha in alphas
    for theta in thetas
        c, s = cos(theta/2), sin(theta/2)
        ITensors.state(::StateName"Theta", ::SiteType"Replica") = [c^4; fill(c^2*s^2, 4); s^4]
        avg_P = zeros(tmax+1)
        avg_Ppr = zeros(tmax+1)
        for sample_id in 1:num_samples
            psi = productMPS(Float64, sites, "Theta")
            P_t = zeros(tmax+1)
            Ppr_t = zeros(tmax+1)
            for t in 0:tmax
                bnd = productMPS(Float64, sites, n -> (n <= NA ? "FDown" : "FUp"))
                P = inner(bnd, psi)
                val = 0.0 + 0.0im
                for k in 0:NA
                    phi = 2pi*k/(NA+1)
                    ITensors.state(::StateName"FDownk", ::SiteType"Replica") = [1.0, 0.0, 0.0, exp(1im*phi), exp(-1im*phi), 1.0]
                    bndk = productMPS(ComplexF64, sites, n -> (n <= NA ? "FDownk" : "FUp"))
                    val += inner(bndk, psi)
                end
                Ppr = abs(val)/(NA+1)
                P_t[t+1] = P
                Ppr_t[t+1] = Ppr
                pairs = sample_pairs(rng, N, N, alpha)
                opslist = [("Transfer", i, j) for (i, j) in pairs]
                mpo = ops(opslist, sites)
                psi = apply(mpo, psi; cutoff=cutoff, maxdim=maxdim)
            end
            avg_P .+= P_t
            avg_Ppr .+= Ppr_t
        end
        avg_P ./= num_samples
        avg_Ppr ./= num_samples
        delta_S2 = -log.(avg_Ppr ./ avg_P)
        for t in 0:tmax
            push!(results, (alpha, theta, t, delta_S2[t+1]))
        end
    end
end

open("results.csv", "w") do io
    println(io, "alpha,theta,t,delta_S2")
    for (alpha, theta, t, v) in results
        println(io, string(alpha, ",", theta, ",", t, ",", v))

end
end

Welcome to the community!

If you post a code example, please make sure to include the code in triple backticks, like ``` .

Otherwise the readability is very bad.

For more details, see: Please read: make it easier to help you

Hi ufechner7, many thx for you for the warm welcome and the tip!

Thanks for fixing the formatting of your example.

One more point: In your example you are reading a .csv file.

Can you share that file?

The first thing I noticed is the use of globals, that can decrease efficiency, but I don’t know whether it is the only reason.

You could try to use a statistical profiler to understand where is that you are spending most of the time

If you could also explain what is that you are trying to do, and how you are trying to do that (a commented code would be fine) we could help better

Finally, you may want to try to write in the Itensor discourse group

Hi, ufechner7, thanks for the reply, and here is the .csv file link: Richard/weingarten_6_6_6_6.csv at main · Veneme/Richard · GitHub.

1 Like

Hi Vince, This code is for simulating entanglement asymmetry (EA) dynamics in a 1D random quantum circuit of 16 six-level “replica” sites under a long-range random quantum circuit: at each of 15 time steps it samples N two-site gates with distance-dependent weight r^{-\alpha}, applies them to a product MPS initialized by an angle theta, then measures the overlap P(t), and a phase-averaged overlap P'(t) with a reference state to compute the second renyi entropy. Repeating this 50 times per theta, and averaging yields the growth dynamics of EA.

Small remark: You can use LaTeX in your messages, like r^{-\alpha} . Just put your LaTeX string between $ signs.

Thank you! ufechner7, and I have fixed it.

1 Like

Greetings, @Hanze_Li , I would like to also welcome you to Julia and Julia Discourse. I tried to run your code as provided, but I found that it would not run until I added the following two lines at the start:

using ITensors: ITensors, @SiteType_str, @OpName_str, @StateName_str, siteinds
using ITensorMPS

I also removed Plots from the using list since it isn’t used by your supplied code. I then launched the code and waited, waited, waited… I finally killed it and adjusted some of the parameters to get a reasonable run time:

tmax = 3
thetas = [i * π/10 for i in 3:3]
num_samples = 1

With these changes the run time was about 23s on my machine. I then ran your file in the profiler and found that 91% of the run time occurred in the function product in the package iTensorMPS.jl. Within that function, 40% of the run time was occupied by line 2173, 39% by line 2160, 8% by line 2170, and 3% by line 2158.

Like a previous commenter, I noticed a few “problems” with your code that are apparent to experienced Julia programmers, but the effect of these on total compute time is negligible, compared to the time you’re spending in the package functions. I know nothing about quantum circuits nor the specialized packages dealing with them like ITensors. So I can’t really help any further. Perhaps others here have some useful advice, or perhaps you could raise an issue at the package repository, and ask if there is anything you can do to speed up your computations.

2 Likes

It looks like you are applying gates to random pairs of sites of an MPS, that’s going to generate a lot of entanglement and therefore the computational cost and memory will grow very quickly in general. Perhaps MPS isn’t the right ansatz or technique for this problem if you need to reach longer circuit depths or you can’t exploit some kind of locality or structure in the problem.

2 Likes

Hi, Peter. Thank you so much for your thorough testing and detailed feedback! You’re absolutely right: despite Julia’s unique strengths and my less-than-perfect coding style, that isn’t the core issue. I ran the same benchmarks and likewise found that the vast majority of the runtime is spent in the product function of ITensorMPS.jl. More fundamentally, for my physical problem, where long-range interactions drive entanglement to grow too rapidly, the MPS ansatz may simply not be the best choice. Nonetheless, I really appreciate your enthusiastic help and guidance.

1 Like

Yeah, to expand on what @mtfishman said, MPS and DMRG are explicitly designed to take advantage of the low entanglement present in 1D systems with short range interactions, because in such subsystems the entanglement between any pairwise partition of the system should scale with the size of the boundary, which in 1D is \mathcal{O}(1).

This small entanglement is exactly what allows you to express very large systems in terms of MPS with very low computatioal effort, but if you have long range interactions that all goes out the window and you’ll end up with a system that needs approximately the whole entangled Hilbert space, so you’ll need to store something approaching d^l numbers, where d is the local single-site Hilbert space dimension, and l is the number of sites. That scaling gets catastrophic pretty quickly.

1 Like

Hi Matthew, Thank you very much for your professional guidance. You’re absolutely right that applying an MPS ansatz to a long-range random circuit carries significant risk, entanglement grows too quickly and blows up the bond dimension. In fact, we have used ED up to system size N=20, but remain concerned about finite-size effects in those results. That’s why we’re now exploring ITensors for somewhat larger systems (N>20), hoping to uncover scaling trends beyond the ED regime. I fully appreciate that for very deep circuits the MPS limitations will become severe, but our current focus is on early-time dynamics at moderate depths. Indeed, in our benchmarks the ITensorMPS.jl method and ED agree very well for
N\leq20. Given this, do you think it’s still reasonable to trust the ITensor/MPS approach for shallow circuits and slightly larger system sizes?

Yes, MPS should be a good approach for larger system sizes and shallow circuits. However, note that when you apply nonlocal gates, in ITensorMPS.apply we insert swap gates to make the gates local so they can be applied to the MPS with optimal (2-norm) compression (that’s a pretty standard technique, there are other techniques like turning the nonlocal gates into MPOs which may help with prefactors but the scaling will still be bad). So, when estimating the circuit depth you have to take those swap gates into account. For example, if you apply a gate that is j sites apart, the circuit depth including swap gates to apply that gate is roughly 2j, since it requires j swap gates to make the gate local and then j swap gates to swap the sites back to the original locations. So the number of actual gates you can apply before the bond dimension gets too large may be pretty small.

Regarding whether or not you can trust the results, you should check the entanglement spectrum and truncation error. For example, if you look at the singular value spectrum of the MPS and truncation error of the SVD from each gate application, you can get an estimate for the error in applying the gates given the maximum bond dimension and truncation tolerance you set.

1 Like

Hi, Mason. Thank you for the expand all; you are right. My question might become: at what size or depth will the MPS ansatz fail in a quantum circuit? In my tests, this program still works at a small size (although it will inevitably fail for large N or large t). We want to access the largest possible size before it expresses failure.

If you have long range entanglement, I doubt it’d ever beat ED. You’re basically doing a ED with extra steps and less efficiency.

You should test it though, maybe there’s a regime where it wins

Many thanks for your professional comments, Matthew. I’ll use these as a reference for my upcoming debugging.

I appreciate for your helpful comments, Mason. Indeed, for typical long-range physics problems, the MPS approach can be extremely subtle. I completely understand your concerns.