The issues of frequent recompilation and low computational efficiency in ITensor.jl

Hanze_Li · July 13, 2025, 6:41am

I’m a beginner in Julia and I’m using ITensor.jl for computations in my project. However, I’ve noticed that the program runs with very low efficiency, and both CPU and memory usage are minimal. I’m not sure how to optimize it to address this issue.

using ITensors, LinearAlgebra, DelimitedFiles, Plots
using Random, Combinatorics, Distributions

function sample_pairs(rng::AbstractRNG, N::Int, m::Int, alpha::Float64)
    all_pairs = collect(combinations(1:N, 2))
    dists = [min(j-i, N-(j-i)) for (i,j) in all_pairs]
    weights = alpha == 0.0 ? ones(length(dists)) : [r^(-alpha) for r in dists]
    probs = weights ./ sum(weights)
    cat = Categorical(probs)
    idxs = rand(rng, cat, m)
    return [all_pairs[k] for k in idxs]
end

ITensors.space(::SiteType"Replica") = 6
uq = readdlm("weingarten_6_6_6_6.csv", ',')
function ITensors.op(::OpName"Transfer", ::SiteType"Replica", s1::Index, s2::Index)
    itensor(Float64, reduce(vcat, uq), s2', s1', s2, s1)
end

ITensors.state(::StateName"FUp",   ::SiteType"Replica") = [1.,1.,1.,0.,0.,1.]
ITensors.state(::StateName"FDown", ::SiteType"Replica") = [1.,0.,0.,1.,1.,1.]

N = 16
NA = 4
tmax = 15

alphas = [0.0,]
thetas = [i * π/10 for i in 2:5]

num_samples = 50

cutoff = 1e-8
maxdim = 100

rng = MersenneTwister(10086)

sites = siteinds("Replica", N)
results = []

for alpha in alphas
    for theta in thetas
        c, s = cos(theta/2), sin(theta/2)
        ITensors.state(::StateName"Theta", ::SiteType"Replica") = [c^4; fill(c^2*s^2, 4); s^4]
        avg_P = zeros(tmax+1)
        avg_Ppr = zeros(tmax+1)
        for sample_id in 1:num_samples
            psi = productMPS(Float64, sites, "Theta")
            P_t = zeros(tmax+1)
            Ppr_t = zeros(tmax+1)
            for t in 0:tmax
                bnd = productMPS(Float64, sites, n -> (n <= NA ? "FDown" : "FUp"))
                P = inner(bnd, psi)
                val = 0.0 + 0.0im
                for k in 0:NA
                    phi = 2pi*k/(NA+1)
                    ITensors.state(::StateName"FDownk", ::SiteType"Replica") = [1.0, 0.0, 0.0, exp(1im*phi), exp(-1im*phi), 1.0]
                    bndk = productMPS(ComplexF64, sites, n -> (n <= NA ? "FDownk" : "FUp"))
                    val += inner(bndk, psi)
                end
                Ppr = abs(val)/(NA+1)
                P_t[t+1] = P
                Ppr_t[t+1] = Ppr
                pairs = sample_pairs(rng, N, N, alpha)
                opslist = [("Transfer", i, j) for (i, j) in pairs]
                mpo = ops(opslist, sites)
                psi = apply(mpo, psi; cutoff=cutoff, maxdim=maxdim)
            end
            avg_P .+= P_t
            avg_Ppr .+= Ppr_t
        end
        avg_P ./= num_samples
        avg_Ppr ./= num_samples
        delta_S2 = -log.(avg_Ppr ./ avg_P)
        for t in 0:tmax
            push!(results, (alpha, theta, t, delta_S2[t+1]))
        end
    end
end

open("results.csv", "w") do io
    println(io, "alpha,theta,t,delta_S2")
    for (alpha, theta, t, v) in results
        println(io, string(alpha, ",", theta, ",", t, ",", v))

end
end

ufechner7 · July 13, 2025, 6:48am

Welcome to the community!

If you post a code example, please make sure to include the code in triple backticks, like ``` .

Otherwise the readability is very bad.

For more details, see: Please read: make it easier to help you

Hanze_Li · July 13, 2025, 6:54am

Hi ufechner7, many thx for you for the warm welcome and the tip!

ufechner7 · July 13, 2025, 7:04am

Thanks for fixing the formatting of your example.

One more point: In your example you are reading a .csv file.

Can you share that file?

VinceNeede · July 13, 2025, 7:10am

The first thing I noticed is the use of globals, that can decrease efficiency, but I don’t know whether it is the only reason.

You could try to use a statistical profiler to understand where is that you are spending most of the time

If you could also explain what is that you are trying to do, and how you are trying to do that (a commented code would be fine) we could help better

Finally, you may want to try to write in the Itensor discourse group

Hanze_Li · July 13, 2025, 7:33am

Hi, ufechner7, thanks for the reply, and here is the .csv file link: Richard/weingarten_6_6_6_6.csv at main · Veneme/Richard · GitHub.

Hanze_Li · July 13, 2025, 7:42am

Hi Vince, This code is for simulating entanglement asymmetry (EA) dynamics in a 1D random quantum circuit of 16 six-level “replica” sites under a long-range random quantum circuit: at each of 15 time steps it samples N two-site gates with distance-dependent weight r^{-\alpha}, applies them to a product MPS initialized by an angle theta, then measures the overlap P(t), and a phase-averaged overlap P'(t) with a reference state to compute the second renyi entropy. Repeating this 50 times per theta, and averaging yields the growth dynamics of EA.

ufechner7 · July 13, 2025, 8:11am

Small remark: You can use LaTeX in your messages, like r^{-\alpha} . Just put your LaTeX string between $ signs.

Hanze_Li · July 13, 2025, 8:18am

Thank you! ufechner7, and I have fixed it.

PeterSimon · July 14, 2025, 2:56am

Greetings, @Hanze_Li , I would like to also welcome you to Julia and Julia Discourse. I tried to run your code as provided, but I found that it would not run until I added the following two lines at the start:

using ITensors: ITensors, @SiteType_str, @OpName_str, @StateName_str, siteinds
using ITensorMPS

I also removed Plots from the using list since it isn’t used by your supplied code. I then launched the code and waited, waited, waited… I finally killed it and adjusted some of the parameters to get a reasonable run time:

tmax = 3
thetas = [i * π/10 for i in 3:3]
num_samples = 1

With these changes the run time was about 23s on my machine. I then ran your file in the profiler and found that 91% of the run time occurred in the function product in the package iTensorMPS.jl. Within that function, 40% of the run time was occupied by line 2173, 39% by line 2160, 8% by line 2170, and 3% by line 2158.

Like a previous commenter, I noticed a few “problems” with your code that are apparent to experienced Julia programmers, but the effect of these on total compute time is negligible, compared to the time you’re spending in the package functions. I know nothing about quantum circuits nor the specialized packages dealing with them like ITensors. So I can’t really help any further. Perhaps others here have some useful advice, or perhaps you could raise an issue at the package repository, and ask if there is anything you can do to speed up your computations.

mtfishman · July 14, 2025, 12:22pm

It looks like you are applying gates to random pairs of sites of an MPS, that’s going to generate a lot of entanglement and therefore the computational cost and memory will grow very quickly in general. Perhaps MPS isn’t the right ansatz or technique for this problem if you need to reach longer circuit depths or you can’t exploit some kind of locality or structure in the problem.

Hanze_Li · July 14, 2025, 2:02pm

Hi, Peter. Thank you so much for your thorough testing and detailed feedback! You’re absolutely right: despite Julia’s unique strengths and my less-than-perfect coding style, that isn’t the core issue. I ran the same benchmarks and likewise found that the vast majority of the runtime is spent in the product function of ITensorMPS.jl. More fundamentally, for my physical problem, where long-range interactions drive entanglement to grow too rapidly, the MPS ansatz may simply not be the best choice. Nonetheless, I really appreciate your enthusiastic help and guidance.

Mason · July 14, 2025, 2:28pm

Yeah, to expand on what @mtfishman said, MPS and DMRG are explicitly designed to take advantage of the low entanglement present in 1D systems with short range interactions, because in such subsystems the entanglement between any pairwise partition of the system should scale with the size of the boundary, which in 1D is \mathcal{O}(1).

This small entanglement is exactly what allows you to express very large systems in terms of MPS with very low computatioal effort, but if you have long range interactions that all goes out the window and you’ll end up with a system that needs approximately the whole entangled Hilbert space, so you’ll need to store something approaching d^l numbers, where d is the local single-site Hilbert space dimension, and l is the number of sites. That scaling gets catastrophic pretty quickly.

Hanze_Li · July 14, 2025, 2:29pm

Hi Matthew, Thank you very much for your professional guidance. You’re absolutely right that applying an MPS ansatz to a long-range random circuit carries significant risk, entanglement grows too quickly and blows up the bond dimension. In fact, we have used ED up to system size N=20, but remain concerned about finite-size effects in those results. That’s why we’re now exploring ITensors for somewhat larger systems (N>20), hoping to uncover scaling trends beyond the ED regime. I fully appreciate that for very deep circuits the MPS limitations will become severe, but our current focus is on early-time dynamics at moderate depths. Indeed, in our benchmarks the ITensorMPS.jl method and ED agree very well for
N\leq20. Given this, do you think it’s still reasonable to trust the ITensor/MPS approach for shallow circuits and slightly larger system sizes?

mtfishman · July 14, 2025, 3:01pm

Yes, MPS should be a good approach for larger system sizes and shallow circuits. However, note that when you apply nonlocal gates, in ITensorMPS.apply we insert swap gates to make the gates local so they can be applied to the MPS with optimal (2-norm) compression (that’s a pretty standard technique, there are other techniques like turning the nonlocal gates into MPOs which may help with prefactors but the scaling will still be bad). So, when estimating the circuit depth you have to take those swap gates into account. For example, if you apply a gate that is j sites apart, the circuit depth including swap gates to apply that gate is roughly 2j, since it requires j swap gates to make the gate local and then j swap gates to swap the sites back to the original locations. So the number of actual gates you can apply before the bond dimension gets too large may be pretty small.

Regarding whether or not you can trust the results, you should check the entanglement spectrum and truncation error. For example, if you look at the singular value spectrum of the MPS and truncation error of the SVD from each gate application, you can get an estimate for the error in applying the gates given the maximum bond dimension and truncation tolerance you set.

Hanze_Li · July 14, 2025, 3:05pm

Hi, Mason. Thank you for the expand all; you are right. My question might become: at what size or depth will the MPS ansatz fail in a quantum circuit? In my tests, this program still works at a small size (although it will inevitably fail for large N or large t). We want to access the largest possible size before it expresses failure.

Mason · July 14, 2025, 3:25pm

If you have long range entanglement, I doubt it’d ever beat ED. You’re basically doing a ED with extra steps and less efficiency.

You should test it though, maybe there’s a regime where it wins

Hanze_Li · July 14, 2025, 3:41pm

Many thanks for your professional comments, Matthew. I’ll use these as a reference for my upcoming debugging.

Hanze_Li · July 14, 2025, 3:46pm

I appreciate for your helpful comments, Mason. Indeed, for typical long-range physics problems, the MPS approach can be extremely subtle. I completely understand your concerns.

Topic		Replies	Views
Scientific Software Developer at the Flatiron Institute - Tensor Networks, ITensor Jobs itensor	0	1730	November 15, 2021
How to compute inner product of MPS with another MPS translated by 1 right sites New to Julia question	2	80	May 6, 2025
Tensor contractions very inefficient in julia Performance tensors	13	2548	December 18, 2020
How to convert contractions from TensorOperations to ITensor? Performance itensor , tensor-contraction , tensoroperations	1	129	June 17, 2024
How do I implement the fastest tensor contractions Performance question , tensors , tensor-contraction , tensoroperations	11	865	April 18, 2024

The issues of frequent recompilation and low computational efficiency in ITensor.jl

Related topics