Multithreading using more CPUs than expected

lzxnl · July 14, 2023, 3:27am

I opened a Jupyter Notebook kernel with 25 threads (threadid() prints out 25 in the beginning). I then proceed to run some code that manages to use 8900% CPU, as recorded by top on the terminal, i.e. 89 threads worth of CPU, even though I’m only asking for 25 threads. Do people have any suggestions? For reference, I just want to run a sequence of for loops over a 5 dimensional array, and I’ve been doing something like

ThreadPools.@qthreads for i in 1:6
ThreadPools.@qthreads for j in 1:4
ThreadPools.@qthreads for k in 1:5

etc. How is the system using 8900% CPU despite only having 25 threads called in the beginning?

Oscar_Smith · July 14, 2023, 3:52am

BLAS threads (i.e. for matrix multiplication) are separate from Julia threads.

lzxnl · July 14, 2023, 4:02am

I heard you can stop BLAS from using more threads by calling julia -p 1, for instance. If I run my code without the parallelised loops, and on julia -p 1 (supposedly shutting down the BLAS extra threads), with one process and one thread, it still uses 6400% CPU. What could be going on?

jishnub · July 14, 2023, 4:40am

You need to use LinearAlgebra.BLAS.set_num_threads(1) to set the number of BLAS threads

carstenbauer · July 14, 2023, 5:01am

That’s wrong. Use the OPENBLAS_NUM_THREADS=1 environment variable or the interactive option that @jishnub suggested.

gdalle · July 14, 2023, 6:37am

For more details, check out this recent addition to the docs: Performance tips - multithreading and linear algebra

lzxnl · July 14, 2023, 12:41pm

Thanks! This worked! Wow I never knew BLAS would be calling up to 64 threads on its own.

lzxnl · July 15, 2023, 4:09am

Somehow, the problem came back, even without BLAS, and regardless of if I run the program in the terminal, or Jupyter Notebook, the CPU usage keeps blowing up, even if I only ask for 25 cores. Does anyone know what could be causing this? I am only using matrix calculations in my code, with LinearAlgebra.BLAS.set_num_threads(1) set.

The kernel appears to die immediately after the @threads call, which is really strange. If I run it on Jupyter Notebook, it says ‘the kernel appears to have died’. If I run it on the terminal, it says ‘Killed’.

jishnub · July 15, 2023, 6:07am

Could you try using the environment variable? Does that also lead to excessive usage? Could you also print out BLAS.get_num_threads() before the threaded loop?

lzxnl · July 20, 2023, 3:23am

I eventually found the problem. Turns out that I needed to call garbage collection every loop iteration, after setting the intermediate buffer vectors to nothing.

jishnub · July 20, 2023, 4:10am

This shouldn’t be necessary in general. Could you post a minimal example that leads to this? It sounds like a julia issue

danielwe · July 20, 2023, 5:10am

If this makes a major difference you’re likely allocating a lot within your multithreaded tasks. Be aware that multithreading scales poorly in such cases due to single-threaded GC (at least until Julia 1.10 drops). To avoid this, you should try to allocate intermediate buffers once per task rather than once per iteration. This blog post shows one way to do it (dropping @threads in favor of upfront chunking and @spawn): PSA: Thread-local state is no longer recommended.

On a different note, be aware that ThreadPools seems to not quite have kept up with the changes required by the dynamic schedule that @threads defaults to since Julia 1.8. But this new schedule also reduces the need for ThreadPools, so I’d suggest working without it, using just @threads and/or @spawn, and only bringing back ThreadPools once you’re sure your code works, if you’re still curious about its functionality.

(Specifically, while ThreadPools.@tspawnat was fixed, I’ve noticed other places in the code where @threads is used assuming the semantics of @threads :static. I’ve been meaning to file an issue, but haven’t gotten around to it yet.)

Topic		Replies	Views
How to prevent BLAS from thrashing with Julia? General Usage parallel	5	2213	May 30, 2017
Why julia is not using all my CPU? General Usage	18	3853	April 25, 2020
BLAS fails in Julia's multithreaded mode with too many threads General Usage question , blas , hpc	4	1381	February 15, 2017
Is the Julia linear algebra multithreaded? General Usage	6	2787	February 7, 2018
Julia Threads vs BLAS threads Internals & Design	16	11032	July 26, 2018

Multithreading using more CPUs than expected

Related topics