FLoops composable way of multithreading nested loops?

PatrickMcFarlane · September 27, 2021, 7:39pm

I’m writing an algorithm for a discrete time dynamic programming problem in economics. The algorithm involves nested looping over 4 state variables - 2 exogenous, 2 endogenous. The solution to the problem 4 dimensional array, with each dimension corresponding to one of the state variables.

I would like to multithread these nested loops if I can. From reading here, FLoops.jl seems to be an ideal way of doing this. However, there is a complication because the outer two loops need to wait until the inner two loops have completed before they can move on to their next iteration.

Below I’ve set out a toy example of what I need to do:

for (iz,iβ) in Iterators.product(eachindex(zvals), eachindex(βvals))

         for (id,ia) in Iterators.product(eachindex(dgrid), eachindex(agrid))

                         tmp [id,ia]  = somecalcs(id,ia,iz,iβ) #compute the solution

         end
         
        @views soln[:,:, iz, iβ] = curvinterp(tmp[id,ia],amesh,dmesh) #interpolate solutions onto rectilinear mesh

        @views soln[:, :, iz, iβ] = swapconstraints!(soln[id, ia, iz, iβ], iz, iβ)  #swap in constrained solutions

end

Should I use Floops.jl - e.g., @floop ThreadedEx() - on both the outer and inner loops?

I know that I can get an answer to this question by experimentation using @btime.

But I’m wondering if there’s a more theoretical answer rooted in how Floops.jl works and the different executor options work. …maybe I should use a different executor for the outer loop?

Greatly appreciate any advice anyone has.

-Patrick

tkf · September 28, 2021, 1:08am

If length(zvals) * length(βvals) is larger than the number of CPU cores you have and the loop body has similar workload, parallelizing for (iz,iβ) in should be fine. Otherwise, maybe you’d need

@floop ThreadedEx(basesize = 1) for iβ in eachindex(βvals), iz in eachindex(zvals)
    @floop ThreadedEx() for ia in eachindex(agrid), id in eachindex(dgrid)
...

This uses basesize = 1 on the outer loop so that each iteration uses one task (i.e., maximally parallelize the loop). The inner loop can also be ThreadedEx(basesize = max(1, Threads.nthreads() ÷ length(zvals) * length(βvals))) to reduce the number of tasks a bit.

PatrickMcFarlane · September 28, 2021, 12:23pm

Great! Thank you so much.

Topic		Replies	Views
Parallelizing a Nested Loop Concurrently Performance parallel , loops	17	2611	February 1, 2022
Parallel run in for loop General Usage parallel	10	1019	April 9, 2021
Multi-threading for nested loops with inner loop collection depending on outer loop Performance parallel , multithreading , io	7	709	April 18, 2023
Multithreading for nested for loops General Usage parallel , multithreading , threads	13	1735	August 16, 2023
Multithreading for nested loops Performance multithreading	42	12483	January 20, 2022

FLoops composable way of multithreading nested loops?

Related topics