Multi-thread for nested loop with internal conditions, tricky

Hello everyone, I am computing some interactions between systems from the results of other simulations, sort of post-processing.

As noted in the code, for system 1 I have 1000 simulations (variations), system 2 has 400.
In short, the code import files of folders “simulation1” and “simulation2”, do some math, then updates the struct QuantumDynamics{Float64}() .

As you can see in the code, I perform a permutation of all files in photon_I with those in photon_II. This is taking ages, therefore I know that the best option is to multi-thread the inner loop, at least. For context, the analysis of files in photon_I takes approx 80 seconds.

However, I can not get to work the multi-threading. The problem lies on the conditions traj_phot1 == 1 and traj_phot1 == 1 && traj_phot2 == 1 on the first step of the loop, because they activate first to update the mutable-struct update_quantumTraj!(system_2_dynamics, cropp_phot_II). When the update occurs, the matrix system_2_dynamics.interaction gets dimensions (prior to the update it is an empty matrix).

At some point, I tried to use AcceleratedKernels, but it gives an error, it can not sum matrix a (200,600) with b (0,0). Basically the same issue in the previous paragraph.

I would appreciate any help or advice that helps me to implement the multithreading of the loops, but specially the inner loop photon_II. Preferable AcceleratedKernels or Base.Threads :melting_face: :melting_face:

Thanks in advance

system_1_dynamics = QuantumDynamics{Float64}() # Struct 
system_2_dynamics = QuantumDynamics{Float64}()

# folder with results of several simulations. approx 1000
photon_I = get_files("simulation1")
# folder with results of several simulations. approx 400
photon_II = get_files("simulation2")

crop_step = 30
binar_interac = true   # whether count ph-ph interaction of 2 systems

for traj_phot1 in eachindex(photon_I)

# Struct  QuantumDynamics
tmp_phot_I = import_simulation_data(joinpath(@__DIR__, photon_I[traj_phot1]))

# Struct  QuantumDynamics
# crop_trajectory! on the way
cropp_phot_I = crop_trajectory(tmp_phot_I, crop_step)

        if traj_phot1 == 1 
            compute_interaction(cropp_phot_I)
        else
            compute_interaction(cropp_phot_I)
            cropp_phot_I.interaction .+= system_1_dynamics.interaction
        end

        update_quantumTraj!(system_1_dynamics, cropp_phot_I)

        if binar_interac
            #  @threads for traj_phot2 in eachindex(photon_II)
            #  thread_id = threadid()
            @inbounds @simd for traj_phot2 in eachindex(photon_II)
                # tmp_matr = similar(tmp_6wmSE.window)
                # AK.foreachindex(photon_II) do traj_phot2
                @views tmp_phot_II  = import_simulation_data(cropp_phot_I.intensity[2, :], joinpath(@__DIR__, photon_I[traj_phot1]))

                cropp_phot_II = crop_trajectory(tmp_phot_II, crop_step)
                compute_intra_intens(cropp_phot_II)

                if traj_phot1 == 1  && traj_phot2 == 1 
                    compute_inter_interaction(cropp_phot_I, cropp_phot_II)
                else
                    compute_inter_interaction(cropp_phot_I, cropp_phot_II)
                    cropp_phot_II.interaction .+= system_2_dynamics.interaction
                end
                update_quantumTraj!(system_2_dynamics, cropp_phot_II)
                println("inter interaction: ", traj_phot2)
            end

        else
            #
        end
normlize!(system_1_dynamics.interaction)
normlize!(system_2_dynamics.interaction)
end
1 Like

A couple of remarks:

  • your code is hard to read because the identation is off. Please fix that.
  • we cannot run your code at all nor do we even know what package you use/what the structs do. This means we can basically just guess something. Oftentimes it is necessary to change datastructures/the overall approach for best performance. So without the full context, suggestions are much more limited.
  • I assume that the goal is not threading for the sake of itself but rather you want this script to run faster. In that case before trying threading at all you should implement the performance tips. Namely:
    • put your code into functions
    • separate the I/O stuff as best as possible from the computation stuff (because I/O is inherently type unstable), i.e. try to load as much data as possible before doing the computation in a separate function
    • try profiling to figure out which function is actually the bottleneck.
    • setup a benchmark with @btime to get a reference time so you can gauge the effectiveness of changes
  • another general comment: I think most functions here mutate their inputs (and not all of them follow the convention that their name ends in !, e.g. import_simulation_data, compute_intra_intens). Threading with mutating code is usually very tricky because you need to ensure that you don’t overwrite stuff that is still in use. Thus a general pattern is that you trade memory for speed with threading because you need to make more copies. Do you have memory to spare or are you already close to full here?