I didn’t find any recent thread on this topic and from the documentation it seemed like development in this area was very active so I was wondering if there is some updated information.

I run a simulation model that takes a couple different parameters and I would like to loop over different values of two of them in parallel on my laptop (Macbook M1 if important), i.e. in a simplified form my code would look something like this,

convergence_time = zeros(10,10)
for a in 1:10
for b in 1:10
res = simulate_model(a,b)
convergence_time[a,b] = res.t_conv
end
end

Now I tried to parallelize this by using Threads.@threads on both for loops:

convergence_time = zeros(10,10)
Threads.@threads for a in 1:10
Threads.@threads for b in 1:10
res = simulate_model(a,b)
convergence_time[a,b] = res.t_conv
end
end

which did speed up the calculation but I am wondering if that is currently the best way to go about it.

The details will very much depend on what simulate_model(a,b) does. If you’re solving an ODE, perhaps consider an ensemble problem in DifferentialEquations.jl using EnsembleThreads() (though if the simulation is sufficiently short, then doing things serially might work out faster). Otherwise, you can of course simplify your nested loops with for a in 1:10, b in 1:10, which may permit the compiler to simplify things more readily, or at least save the overhead of the 10 thread spawns in the inner loop. Finally, for M1-based machines, using the Apple Silicon native Julia (>1.8) would be faster and less buggy, while also giving you a dynamic thread scheduler by default, which may make better use of your compute resources than the static scheduler in 1.7.

For current Julia versions, this creates too much overhead. In any case, it should be sufficient to parallelize the outer for loop unless you have more than 10 cores.

Threads.@threads now defaults to a new :dynamic schedule option which is similar to the previous behavior except that iterations will be scheduled dynamically to available worker threads rather than pinned to each thread. This behavior is more composable with (possibly nested) @spawn and @threads loops (#43919, #44136).