I’m writing an algorithm for a discrete time dynamic programming problem in economics. The algorithm involves nested looping over 4 state variables - 2 exogenous, 2 endogenous. The solution to the problem 4 dimensional array, with each dimension corresponding to one of the state variables.
I would like to multithread these nested loops if I can. From reading here, FLoops.jl seems to be an ideal way of doing this. However, there is a complication because the outer two loops need to wait until the inner two loops have completed before they can move on to their next iteration.
Below I’ve set out a toy example of what I need to do:
for (iz,iβ) in Iterators.product(eachindex(zvals), eachindex(βvals))
for (id,ia) in Iterators.product(eachindex(dgrid), eachindex(agrid))
tmp [id,ia] = somecalcs(id,ia,iz,iβ) #compute the solution
end
@views soln[:,:, iz, iβ] = curvinterp(tmp[id,ia],amesh,dmesh) #interpolate solutions onto rectilinear mesh
@views soln[:, :, iz, iβ] = swapconstraints!(soln[id, ia, iz, iβ], iz, iβ) #swap in constrained solutions
end
Should I use Floops.jl - e.g., @floop ThreadedEx()
- on both the outer and inner loops?
I know that I can get an answer to this question by experimentation using @btime
.
But I’m wondering if there’s a more theoretical answer rooted in how Floops.jl works and the different executor options work. …maybe I should use a different executor for the outer loop?
Greatly appreciate any advice anyone has.
-Patrick