@threads - machine stops using allocated cores during run

mrVeng · August 5, 2022, 9:45am

Dear all,

I have a function in a function that I can run in parallel, and I am using the ‘@threads’ macro for that. Calling that outside function takes several hours, and, at the beginning, works as expected and all intended threads are used on my machine.

Sometimes, at a - random - iteration however, I can see that not all allocated threads are used anymore, and instead only 1 thread is used until the outside function is finished. A simplified MWE:

function inside_fun(n)
    return randn(n)
end
function outside_fun(timesteps::Int64, n::Int64)
    for iter in 1:timesteps
        Base.Threads.@threads for Nthreads in 1:n
            inside_fun(10)
        end
    end
    return nothing
end
outside_fun(10, 5)

I have tried to google this problem, but was unsuccessful unfortunately.

(1) What is the reason that at some point Julia does not use all allocated threads anymore with the @threads macro?
(2) Also, is there a way to fix this and “force” Julia to continue using all allocated threads on my machine?

cmarcotte · August 5, 2022, 11:26am

You could simply be witnessing the last (n-1) calls finishing up. When I call your outside_fun(10_000_000,5) (the only way to make it run long enough to measure with htop), the CPU stays pegged at 400% until the last second and then it drops to 300% and then ~0%. Alternatively, it could be the GC trying to clean up? It’s hard to tell without more information. If you use a benchmarker, does it say there is a lot of time spent in GC?

mrVeng · August 5, 2022, 11:32am

Thank you for your answer!

You could simply be witnessing the last (n-1) calls finishing up.

No, it often has more than a 1/3 of outside iterations left - the function above was just an example because the actual functions encompass several packages and I dont think anyone would look through that.

Alternatively, it could be the GC trying to clean up?

This could definitely be the case. It might also be the case that Memory might get full at some point until the GC kicks in. Could this cause the behaviour I described above?

cmarcotte · August 5, 2022, 11:45am

If your memory is full, then usually you would see either a crash or a dramatic slowdown as it starts using swap (your disk as memory). If it persists indefinitely while also making progress on the iterations, it seems unlikely to be GC (but perhaps someone with more domain expertise might be able to offer an insight).

What seems more likely is that it is one of the “several packages” reaching a bottleneck, but again, without more information I can not say with confidence. What type of problem is it that you are solving? ODEs? Matrix factorizations? Which packages are you using?

ikirill · August 11, 2022, 12:39pm

I ran your script with 10 replaced by 1000000, and watched what julia was doing using perf, with the command

sudo perf top -p $(pidof julia)

The output

  62.68%  libjulia-internal.so.1.7  [.] get_next_task                                                                                                                                                                                                                                                  
   2.89%  libjulia-internal.so.1.7  [.] jl_task_get_next                                                                                                                                                                                                                                               
   2.58%  libjulia-internal.so.1.7  [.] jl_process_events                                                                                                                                                                                                                                              
   1.87%  [kernel]                  [k] delay_halt_mwaitx

tells me that most of your code is spending its time in the overhead of switching between the different threads’ tasks. This is with 16 threads.

Maybe your MWE doesn’t represent your actual code, or maybe the threading overhead is degrading your performance that much, it’s hard to tell. I would recommend running perf top, htop, iotop, to see if your system can tell you more.

Topic		Replies	Views
Inconsistent CPU utilisation in @threads loops Performance parallel , multithreading , threads , threading	20	890	February 23, 2024
Threads.@threads does not work properly General Usage	6	310	July 7, 2024
Embarrassingly parallel multi-threading doesn't scale Performance multithreading	17	1614	October 16, 2021
Increased allocations when using threads Performance question	20	283	July 11, 2024
Multi-threading on a 2 CPU system New to Julia multithreading	15	1084	February 2, 2023

@threads - machine stops using allocated cores during run

Related topics