The Optimization Problem with Nested Loops in Julia

Umut_Can_Turhan · October 11, 2023, 8:01am

Hi everyone, I have a problem related to the optimization problem with nested loops. I use QuantumOptics.jl (qojulia.org) module to calculate for some physical situation. First of all, let me share my code and explain it.

function Hubbard_Interaction(P, Pt, cut_mb_basis, cut_off, U)
    
    # P1 and P1t are just a matrix, don't focus on it :)
    P1 = P.data
    P1t = Pt.data

    #Preety fast calculation with einsum. No problem here
    @einsum coefficient[k,l,m,n] := P1[k,i] * P1[l,i] * P1t[i,m] * P1t[i,n]

    # Sparse operator for fast modifying
    Vint_mb_cut = SparseOperator(cut_mb_basis)
    
    # The problem starts here :/    
    for k in 1:cut_off
        for l in 1:cut_off
            for m in 1:cut_off
                for n in 1:cut_off

                    # These four operators are just matrices but note that they depend on the loop indices!
                    a1t = create(cut_mb_basis, k)
                    a2t = create(cut_mb_basis, l)
                    a2  = destroy(cut_mb_basis, m)      
                    a1  = destroy(cut_mb_basis, n)

                    #Matrix multiplication pretty fast, no problem here      
                    Vint_mb_cut += U/2*coefficient[k,l,m,n]*a1t*a2t*a2*a1
                end
            end
        end
    end
    
    return Vint_mb_cut
end

There is a function that calculates the interaction between two particles. It’s unnecessary for non-physicist people. The main point is that there is a nested loop in my code. When cutt_off is a big number, the code works quite slowly The big question is how can I optimize this code for big cutt_off (or just a number from starting 1) values? For example, cut_off = 50

skleinbo · October 11, 2023, 8:20am

It seems you are rebuilding the creation and annihilation matrices unnecessarily in the innermost loop. E.g. a1t = create(cut_mb_basis, k) only depends on the outermost variable and could be moved to

for k in 1:cut_off
  a1t = create(cut_mb_basis, k)
  [...]

and so on, which should save about 75%.

But actually it might be smarter to create all operators from the get-go. I suppose they are sparse matrices? Shouldn’t take up too much space. I’ve never worked with QuantumOptics, so that’s a guess.

Umut_Can_Turhan · October 11, 2023, 10:22am

All of these matrices are sparse arrays, they help to improve code performance of course. I tried your opinion, but the results are not convincing. Because when cut_off = 75 the code still works slowly.

kellertuer · October 11, 2023, 11:22am

Can you post your updated code maybe? Because the one above allocates 4 matrices in every inner loop run, which looks really costly (though I am not sure what create and destroy do).

skleinbo · October 11, 2023, 11:33am

Did you try pre-allocating all operators outside of the loop, e.g

A = [create(cut_mb_basis, k) for k in 1:cut_off]
At = [destroy(cut_mb_basis, k) for k in 1:cut_off]

and then indexing into those arrays in the loop?

stevengj · October 11, 2023, 12:09pm

Your innermost loop executes cut_off^4 times, so anything you can do to speed it up will be critical. @skleinbo’s suggestion of pre-computing your operators looks like a good first step.

This is still going to allocate in your inner loop. If the matrices are small, you’re going to pay a big overhead price for these allocations, which you can avoid if you pre-allocate scratch space and use in-place operations like mul!.

How big are your matrices? If they are a small fixed size (say 10x10 or smaller), you might use StaticArrays.jl. (This is also in the performance tips.)

simeonschaub · October 11, 2023, 3:53pm

The first thing to think about would be whether you could take advantage of any symmetries here. Assuming your operators are all bosonic, both coefficient and your operator product should be symmetric under the exchanges k \leftrightarrow l and m \leftrightarrow n. Taking advantage of that (for example using SymmetricTensor from GitHub - Ferrite-FEM/Tensors.jl: Efficient computations with symmetric and non-symmetric tensors with support for automatic differentiation.), you’d only need to calculate (n(n+1))^2/4 instead of n^4 matrix products, a 4x speedup!

You probably also have commutation relations for a_i and a^\dagger_i, so you might even be able get that down to {n+3 \choose 4} + n(n+1)/4 matrix products, which would be a 24x speedup

Umut_Can_Turhan · October 18, 2023, 8:12am

Thank you! It’s a very effective way to construct any array quickly.

Umut_Can_Turhan · October 23, 2023, 8:02am

I generally use the very big matrices/tensors that have 9000x9000 shapes.

Topic		Replies	Views
Ann: QuantumOptics.jl 0.4 Community	0	888	June 21, 2017
Improving performance of Integral equations (3-Nested loops) Performance performance , loops	4	1066	May 13, 2021
Optim.jl v0.9.0 is out! Community release , optim	16	2388	June 28, 2017
Error running master equation in QuantumOptics.jl Quantum question	3	93	November 5, 2024
Optimizing nested for loops and memory allocation in finite differences routine Performance	3	269	September 19, 2022

The Optimization Problem with Nested Loops in Julia

Related topics