Fastest way to run a for loop in parallel

rcesarpacheco · December 27, 2020, 9:31pm

I am a new user of Julia and I am trying to run a for loop in parallel using .Threads but I don’t know if this is the fastest way that I can do this.
The for loop is:

    res = zeros(length(posicao),2)
    Threads.@threads for i = 1:length(posicao)
        res[i,:] = resolve_prob_individuo(precos,posicao[i],r_emprestimo)
    end
    return sum(res,dims=1)

The function resolve_prob_individuo returns a vector of float with size 2. I am interested in the sum of these vectors only. Each call to the function resolve_prob_individuo takes ~1.4seconds. The length of the vector posicao is 36. This for loop takes ~12seconds. I am running this on a Ryzen 3600, with 12 threads. Is there any way to improve this?
I will run this for loop thousand of times, so any small improvement could make a huge difference.

pixel27 · December 27, 2020, 9:50pm

It might not help much but you could try:

    total = Atomic{Float64}()
    Threads.@threads for i = 1:length(posicao)
        res = resolve_prob_individuo(precos,posicao[i],r_emprestimo)
        atomic_add!(total, res[1] + res[2])
    end
    return total[]

It will reduce the memory usage somewhat, and moves the addition into the threads. There may be some contention using the atomic operations but with a single CPU (multiple cores) you probably won’t feel it.

Just for giggles you might also try:

    total = Atomic{Float64}()
    @sync begin
        for i = 1:length(posicao)
            local j = i
            @Threads.spawn begin
                res = resolve_prob_individuo(precos,posicao[j],r_emprestimo)
                atomic_add!(total, res[1] + res[2])
             end
        end
    end
    return total[]

(I’m not sure if the intermediate variable ‘j’ is needed, it might be fine just using ‘i’ in the @spawn.)

If the time to execute resolve_prob_individuo varies wildly then this might show better performance.

Lastly you might consider updating resolve_prob_individuo to generate Tuple instead of an array of 2. I believe if you generated a Tuple there would be no memory allocations for the result instead of 2 allocations.

Karajan · December 27, 2020, 10:01pm

You can look at the load of your processor. If it struggles somewhat I found that using ThreadsX.map! sometimes helps.

lmiq · December 27, 2020, 10:14pm

If you have to run the loop thousands of times, you have to consider the possibility of parallelizing not the loop, but those multiple runs. Of course, of they do not depend one on the previous sequentially.

Specifically about that loop: you will get a better scaling if you split the loop in two, one on the number of threads, the other on the number of operations per thread. Something like:

nthreads = Threads.nthreads()
result = zeros(nthreads)
n_per_threads = (number of calculations)/(number of threads)
Threads.@threads for it in 1:nthreads
   first = (it-1)*n_per_thread+1  
   last = first + n_per_thread
   for i in first:last
        result[it] = ...
   end
end
sum(result)

Of course you would have to tune the details.

(if you have spare time, you might want to take a look at this class, where I have discussed the parallelization of the calculation of the potential energy between particles, but the principles are the same. It is quite basic. (in Portuguese: Paralelização do cálculo da interação entre partículas - YouTube)

Satvik · December 27, 2020, 10:58pm

I’m a beginner to Julia, but I’ve parallelized some of my code recently, here are some things I’ve found helpful.

First, try logging your threads using GitHub - tro3/ThreadPools.jl: Improved thread management for background and nonuniform tasks in Julia. Docs at https://tro3.github.io/ThreadPools.jl . This will give you a good picture of whether your threads are taking roughly the same amount of time, and how many are actually running. It’ll also help tell you whether most of the time is spent actually running the threads.

Second, try running top on the command line when running your job. This will give you similar information but help separate overhead from actual processing.

Topic		Replies	Views
How to run tasks in parallel? General Usage first-steps , multithreading	6	1380	February 22, 2020
Need help understanding how to run a for loop in parallel General Usage parallel	2	456	July 27, 2020
How to code faster parallel for loop New to Julia parallel	8	7290	January 18, 2019
Threads/Parallel New to Julia	22	8795	October 24, 2017
Help with Threading New to Julia question	10	1578	July 4, 2020

Fastest way to run a for loop in parallel

Related topics