What is the equivalent of this OpenMP Directive in Julia?

Amro · May 31, 2022, 8:43pm

What is the equivalent to the below command in Julia, in which I can run a specific code (or function) on a specific CPU core?

!$OMP PARALLEL
    # Code
!$OMP END PARALLEL

stevengj · May 31, 2022, 9:31pm

That OpenMP directive doesn’t run anything on a “specific CPU core”. It tells OpenMP to execute the given code in parallel using a team of threads (and wait until all threads have completed), but how these are scheduled onto CPU cores is up to the runtime and operating system.

If you just have a block of code in Julia that you want to run N times on all N threads, I guess the closest equivalent would be something like:

tasks = [Threads.@spawn begin
    # code
end for _ = 1:Threads.nthreads()]
foreach(wait, tasks)

For example, running with julia -t 4 (4 threads):

julia> tasks = [Threads.@spawn begin
           sleep(10); println("hello from thread ", Threads.threadid())
       end for _ = 1:Threads.nthreads()]
4-element Vector{Task}:
 Task (runnable) @0x000000010a0aa400
 Task (runnable) @0x000000010a0aa570
 Task (runnable) @0x000000010a0aa6e0
 Task (runnable) @0x000000010a0aa850

julia> foreach(wait, tasks)
hello from thread 2
hello from thread 3
hello from thread 4
hello from thread 1

torrance · June 1, 2022, 5:20am

Or much more simply:

Threads.@threads for i in 1:N
    # Do work
end

The Threads.@threads macro will distribute the loop into N / Threads.nthreads() chunks and assign this work to your active threads. The threads need to be created first, e.g. by starting julia with --threads [N|auto]

As @stevengj says, though, this won’t give you the ability to assign to a specific core, although I believe you can set thread affinity by launching julia with the environment variable JULIA_EXCLUSIVE=1 which might get you some way there (not sure exactly what you’re trying to do there though).

carstenbauer · June 1, 2022, 7:02am

If you want to pin Julia threads to specific cores, you might want to check out GitHub - carstenbauer/ThreadPinning.jl: Pinning Julia threads to cores (disclaimer: I’m the author).

If you then want to run code on a specific thread (and core) you can use its @tspawnat macro.

Amro · June 1, 2022, 7:05pm

Thank you very much! Your explanations are really helpful.

1- I know that sleep() is to block the current task for a specified number of seconds. But I didnt know/follow the exact usage in this case (i.e., before println), even I have deleted it and got the below. Does it make a barrier? Maybe some comments on the flow of execution of the code would be very helpful.

julia> tasks = [Threads.@spawn begin
                  println("hello from thread ", Threads.threadid())
                         end for _ = 1:Threads.nthreads()]
hello from thread 2
hello from thread 1
hello from thread 3
hello from thread 4
6-element Vector{Task}:hello from thread 6

 hello from thread 5
Task (done) @0x000000009cef7840
 Task (done) @0x000000009cef7a30
 Task (done) @0x000000009cef7c20
 Task (done) @0x000000009cef7e10
 Task (done) @0x000000009c9cc010
 Task (done) @0x000000009c9cc200

2- Since tasks = [...] can give the same results, what is the meaning of using foreach(wait, tasks)?

Amro · June 1, 2022, 7:13pm

Thank you very much!

Can I put it at the beginning of my julia code as below before running it the REPL?

ENV["JULIA_EXCLUSIVE"] = 1

Amro · June 1, 2022, 7:17pm

Thanks for your note! Will it be available for Windows soon?

stevengj · June 1, 2022, 7:38pm

I only put it there to make each task run for a little while, so that the foreach(wait, tasks) had a chance to wait for them to complete. (Otherwise they run so quickly that the tasks are already done by the time I run wait.)

Julia uses a dynamic scheduler, not a static schedule. You spawn as many tasks as you want (often many more than you have threads) in order to expose the potential parallelism of your program, and then the runtime scheduler will assign these tasks to threads.

In this particular case, it seems your tasks are running so quickly that the runtime scheduler may have time to migrate them to all the threads.

(One big advantage of a dynamic scheduler is that it can load-balance irregular tasks, whereas a static schedule only works well if you know ahead of time how to equally divide the computational effort. Another big advantage is that parallelism is composable — your code can spawn multiple tasks, and a library routine can also spawn multiple tasks that you don’t know about, and the runtime scheduler will load-balance all of them together.)

It looks like you never ran anything on thread 1, so S[1] is left at whatever you initialized S to?

Amro · June 1, 2022, 9:14pm

Oops, I forgot to mention the initialization. For this reason, I putting it again (on 6 threads):

S = zeros(nthreads());
N = zeros(nthreads());
@threads for i in 1:nthreads()
    println("hello from thread ", threadid())
    S[i] = threadid();
    N[threadid()] = threadid();
end
hello from thread 2
hello from thread 5
hello from thread 6
hello from thread 4
hello from thread 5
hello from thread 3
julia> S
6-element Vector{Float64}:
 3.0
 5.0
 2.0
 6.0
 3.0
 4.0

julia> N
6-element Vector{Float64}:
 0.0
 2.0
 3.0
 4.0
 5.0

From the printed hello..., I see that threadid=5 has addressed two iterations, so number 5 should appear twice in S not umber 3, right?
Here it is clear that N[1]=0 because threadid=1 has not been assigned any iteration.

stevengj · June 1, 2022, 9:32pm

I don’t see how you can tell which iteration occurred for which value of i, since you didn’t print that (and the println statements will not necessarily execute in order of i, since they are parallel) … maybe change it to println("hello $i from thread ", threadid()).

(Technically, it may be possible for the loop body to migrate from one thread to another while it executes, so you have a bit of a race condition here, but I guess that’s not too likely for such a small/fast loop body.)

stevengj · June 1, 2022, 9:33pm

Why do you care about the affinity to particular CPUs? My feeling is that this is mostly useful for benchmarking (since it makes the performance more consistent).

carstenbauer · June 2, 2022, 8:40am

No that won’t work, you need to set the environment variable before starting the Julia process. E.g. JULIA_EXCLUSIVE=1 julia -t4

But I agree with Steven here: very likely, you shouldn’t care about the thread-core mapping.

Amro · June 2, 2022, 2:13pm

Maybe my understanding for the execution is not correct. Therefore, I made the change based on your comment as below. Please correct me if my explanations for the below results are not true.

From the println statements, thread 2 has executed two iterations when i equals 2&3. So, number 2 has to be saved in S[2]&S[3], which is true for S[3] but not for S[2] (=3).
Following the same logic, S[4] (should =5 not 6), S[5] (should =6 not 5), and S[6] (should =3 not 4).

S = zeros(nthreads());
@threads for i in 1:nthreads()
    println("hello from $i thread ", threadid())
    S[i] = threadid();
end
hello from 1 thread 4
hello from 3 thread 2
hello from 2 thread 2
hello from 5 thread 6
hello from 4 thread 5
hello from 6 thread 3

julia> S
6-element Vector{Float64}:
 4.0
 3.0
 2.0
 6.0
 5.0
 4.0

I think this is what happened above that there is a migration between the println and the assignment statements, right?

Amro · June 2, 2022, 2:24pm

I am thinking to spawn my main code over the 1:nthreads()-1 thread and reserve the last thread to handle the rest code. So, I can reduce the overlap?

Amro · June 7, 2022, 7:13pm

Any feedbacks here please?

Topic		Replies	Views
What is julia doing with your threads? General Usage	23	1094	February 21, 2024
Thread affinitization: pinning Julia threads to cores General Usage multithreading	10	3695	January 27, 2022
Is multi-threading in Julia more like openMP or MPI? General Usage	2	2955	August 6, 2021
Looking for code to solve (surely common) 'embarrassing parallelism' multithreading use-case Performance parallel , multithreading	16	1547	January 20, 2021
Parallel execution of a number of lines General Usage parallel , multithreading , threads	2	366	November 4, 2022

What is the equivalent of this OpenMP Directive in Julia?

Related topics