Huge CPU load when using GLM

Hello all

I’m using GLM to calculate coefficients for logistic regression. I don’t recall this being an issue when I first developed this script, but recently, I’ve noticed that this script is generating a huge load on my server. The CPU usage goes up to 6000% in some cases as monitored through “top”. I’ve narrowed it down to one function as demonstrated in this MWE

df = DataFrame(:col1 => String[], :col2 => Bool[])

for i = 1:1000
	push!(df, [randstring(1),Bool(rand(0:1))]) 
end 
	
results = glm(@formula(col2 ~ col1), df, Binomial())

I presume that GLM is internally multi threading this calculation, but is there an option to limit the number of threads internally within GLM to prevent saturating the server resources?

You should be able to set the maximum number of threads by starting julia with the --threads option, for example

julia --threads 4

or

julia -t 4

If you had used auto for threads, this would use the number of local threads.

Or set JULIA_NUM_THREADS environment variable in your shell. (And also JULIA_EXCLUSIVE if you’re running on bare metal not a VM)

I have a Ryzen 5 with 6 cores and hyperthreading. I set that env var to 5 so Julia can have 5 cores and my desktop can still remain responsive

I already have that set as threads 1 in the session that called this and it doesn’t seem to make a difference. I still see it go up to a huge number (just reran this test and it went up to 3000%)

How many cores/hyperthreads on your box? Perhaps GLM is calling some BLAS routine that does threading

this server has 64 cores (2 sockets with 32 cores per socket) with 2 threads per core (128 total).

So, it doesn’t seem like there’s an actual problem? (in the sense it’s not using many more threads than cores)

How do you conclude that? It’s generating a huge load on the server so that’s still a problem.

Well, it’s using ~ half the available cores… if I had 2 cores and one of them was being used by Julia, I wouldn’t be worried. If i had 10 cores and 5 of them were in use by Julia… I wouldn’t be worried…

If I have 64 cores and 32 are being used by Julia… should I be worried? I mean, maybe you want to limit it further… sure… but it’s not really a “huge” load for such a server.

In any case, if it’s BLAS that’s the issue, you can use BLAS.set_num_threads() to limit its thread count.

this is a shared resource server and this one process is taking up half the resources of the server.

Well if it’s doing it for a long time… I guess that’s an issue, if it’s doing it for a second, maybe not. Try to set the number of threads for BLAS and see if that helps.

WHere do I set this variable?

I tried this but it errors out

julia> BLAS.set_num_threads(1)
ERROR: UndefVarError: BLAS not defined

I tried to add the BLAS package, but there is no BLAS package to add.

(@v1.8) pkg> add BLAS
    Updating registry at `/prj/yeprd/server/julia/PKG/1.6/registries/General.toml`
ERROR: The following package names could not be resolved:
 * BLAS (not found in project, manifest or registry)

BLAS is part of LinearAlgebra:

julia> using LinearAlgebra

julia> BLAS
LinearAlgebra.BLAS

julia> BLAS.get_num_threads()
4

GLM.jl doesn’t do any multithreading itself – any and all threading comes from the BLAS.

If you’re worried about consuming resources, you can also generate your data much more efficiently:

DataFrame(:col1 => [randstring(1) for _ in 1:1000], :col2 => rand(Bool, 1000))

Benchmarking shows that this is much faster:

julia> function f1()
       df = DataFrame(:col1 => String[], :col2 => Bool[])

       for i = 1:1000
               push!(df, [randstring(1),Bool(rand(0:1))]) 
       end 
       return df
       end
f1 (generic function with 1 method)

julia> f2() = DataFrame(:col1 => [randstring(1) for _ in 1:1000], :col2 => rand(Bool, 1000))
f2 (generic function with 1 method)

julia> @benchmark f1()
BenchmarkTools.Trial: 8983 samples with 1 evaluation.
 Range (min … max):  496.292 μs …   3.969 ms  ┊ GC (min … max): 0.00% … 77.65%
 Time  (median):     536.952 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   555.404 μs ± 203.561 μs  ┊ GC (mean ± σ):  2.82% ±  6.39%

             ██▄  ▁▇▆▃▁                                          
  ▂▁▁▂▃▃▃▃▃▃█████▆██████▇▇▇█▇▇▆▅▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▄
  496 μs           Histogram: frequency by time          633 μs <

 Memory estimate: 275.72 KiB, allocs estimate: 7479.

julia> @benchmark f2()
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  21.254 μs …  2.057 ms  ┊ GC (min … max):  0.00% … 96.97%
 Time  (median):     23.270 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   29.122 μs ± 79.844 μs  ┊ GC (mean ± σ):  12.73% ±  4.60%

  ▄▇█▆▄▃▂▂▂▁▁▂▂▂       ▁▁▁▁                                   ▂
  ███████████████▇▇▇▆▇███████▆▇▆▇▇▇▆▆▅▅▃▅▃▃▃▅▅▃▃▁▁▃▄▁▄▅▃▄▃▄▅▅ █
  21.3 μs      Histogram: log(frequency) by time      67.8 μs <

 Memory estimate: 105.62 KiB, allocs estimate: 2031.

Yes, this reduces the CPU load to 100%. Strangely, the total processing time is hardly changed. What could this be doing?

Sometimes threading a calculation, especially a fast calculation, has so much overhead that it doesn’t help, it can even hurt.