How to prevent BLAS from thrashing with Julia?

oatlzzvztd · May 26, 2017, 7:31pm

Hello.

Let’s say I’m trying to run the following function many times on multiple cores:

@everywhere function test()
  X = randn(800, 800)
  Y = randn(800, 800)
  Base.LinAlg.BLAS.axpy!(2.0, X, Y)
end

(The real function is vastly more complicated but also dominated by a BLAS call).

If I start up Julia with the some number of worker processes and run

julia> pmap(x -> test(), 1:length(workers()))

it appears to me from the CPU scaling that pmap is contending with the threads BLAS is using to run apxy!.

Even if I start up Julia with a single worker process, my eight-threaded Intel Core i7 appears to show 4 threads being used. This is also true after running

julia> BLAS.set_num_threads(1)

How do I spawn worker processes that won’t be competing with BLAS for resources?

JaredCrean2 · May 26, 2017, 11:24pm

Try setting the environment variable OPENBLAS_NUM_THREADS = 1 before launching julia

I’m surprised BLAS.set_num_threads(1) didn’t fix it. Could you post a minimal example of this case?

oatlzzvztd · May 27, 2017, 1:20pm

So here’s a REPL session followed by a screenshot of CPU activity while the REPL was churning on the last statement. Julia v0.5.2, started with one process (which agrees with Activity Monitor). There were other processes running, ofc, but I don’t think that’s what I’m seeing in the image.

I’m not sure how to check what version of BLAS Julia is using, so that’s part of my problem, too.

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.2 (2017-05-06 16:34 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-apple-darwin13.4.0

julia> BLAS.set_num_threads(1)

julia> A = rand(8000, 8000);

julia> b = rand(8000);

julia> c = zeros(8000);

julia> for i = 1:100; c += A \ b; end;

Paradoxically (imo), setting OPENBLAS_NUM_THREADS=1, reopening Julia, and using the same REPL commands makes BLAS use the same four threads, but more fully.

If someone could explain what is going on here, I’d greatly appreciate it. I’m very confused.

JaredCrean2 · May 27, 2017, 11:05pm

I ran your example on Linux (unfortunately I don’t have mac to test on), and without specifying the number of threads blas should use, top reported a steady 400% CPU usage (my machine has 4 logical CPUs). With BLAS.set_num_threads(1), it was steady at 100%. Could you try using top to measure CPU usage and report the numerical value? I’m wondering if this is a problem with the measurement and not with the actual CPU usage.

Ralph_Smith · May 27, 2017, 11:19pm

versioninfo() should show which BLAS library is used. (Your banner says “Official release” so it’s presumably OpenBLAS.) It should also show your processor model, and unless you have tweaked your system I expect it will confirm that you have 4 physical cores with hyperthreading.

You don’t say how you set OPENBLAS_NUM_THREADS, but it looks like it didn’t take. You can check that by displaying ENV["OPENBLAS_NUM_THREADS"] in Julia. If it’s not set, OpenBLAS defaults to the number of physical cores on MacOS, which would explain your second chart. (This differs from Linux.) Note that CPU usage of 50% may mean full utilization - the extra virtual cores don’t have separate floating point units.

Your first chart [with BLAS.set_num_threads(1)] seems to show the scheduler migrating Julia tasks between physical processors. I think this depends on the MacOS version and platform (some systems strive to balance the load on physical processors).

oatlzzvztd · May 30, 2017, 10:17am

Ah, I’d forgotten that most operating systems try to do this. Thanks, I think that solves my problem. Sounds like BLAS.set_num_threads is working as intended.

Topic		Replies	Views
BLAS thread count vs Julia thread count General Usage question , performance , linearalgebra	21	2735	April 6, 2021
BLAS fails in Julia's multithreaded mode with too many threads General Usage question , blas , hpc	4	1365	February 15, 2017
Pmap and multi-threaded BLAS Performance blas , parallel	2	958	November 29, 2019
Multithreading using more CPUs than expected Performance	11	545	July 20, 2023
Julia Threads vs BLAS threads Internals & Design	16	10955	July 26, 2018

How to prevent BLAS from thrashing with Julia?

Related topics