Hi all,
As a new Julia programmer, I’m fascinated by its syntax, code examples availability, its community, and, more importantly, its performance!
Now that I’m having my first experience on Thread Parallelism, I found that its harder to finding clear examples and information about this topic.
As a newcomer, I was very interested in using the @threads to rapidly transform a sequential a loop in a parallel loop “without many changes in the original code”, but the more I read about this subject more I realize that this is not the “silver bullet” I initially thought it was.
I have implemented 11 algorithms to resolve 14 math problems with 5 different dimensions (or number of variables) that are executed 51 times, as implemented in the 4 nested loops as shown below:
#Main loop
dt = @elapsed begin
@threads for p in problems
@threads for v in num_vars
_p = Problem( p.f, p.min, p.max, v ) # copy problem struct
@threads for algo in algorithms
population = pop_mult * _p.num_variables
println( "Alg.: ", algo |> string, " -> ", _p |> string, " | Pop. Size: ", population, " | Num. Iter.: " , iter_mult * v, " (", num_runs , " runs)", " | Start: ", now() )
@threads for _ in 1:num_runs
res = algo( _p, population, iter_mult * v, 1.0e-12 )
push!( results, res )
end
end
end
end
end
As there are 770 different combinations of algorithms, problems, and dimensions, I decided to parallelize the execution by including @threads command on all loops.
I found out that this was the most performant combination by running a smaller subset with 3 algorithms, 3 problems, and 3 dimensions (with smaller dimensions), and got the following results:
NOTE: The "notation" [a-b-c-d] refers to the positions of the "for" loops in the code above, and the value 1 means loop with the @threads command while 0 represents a simple sequential loop.
Then I ran the program on a 2 CPU Epyc system with 64 core / 128 threads each (256 threads total), 256 GB RAM, and in a Linux system, but after 5 days of execution I got a “Killed” message from the system.
Now I started a new run limited to 64 threads to limit the amount of memory used, also since I found out that by not using all cores on multithreaded operations I gain some additional performance (maybe due to multithreading overhead?).
My questions are:
- Is the @threads loop a possible cause for stopping the execution? (As an additional information, I have successfully run a smaller example of the same code on a 6-core system that took almost 4 days but I believe that the @threads combo was [1-0-0-0]);
- What others, more efficient ways, to parallelize this type of executions?
- Any recommendation on code examples for correctly implement parallel code with Julia;
- What special considerations should I have when using multiple threaded executions across multi-CPU systems (if any);
Thank you in advanced!