I’m currently trying to parallelize my code to run on a cluster, and using Threads.@threads. I noticed that in htop the cpu usage indicates that the running threads are “kernel threads” (red), meaning (insofar I understand) that the thread is managed by the system scheduler. I have never seen that happen using python’s multiprocessing for example.
Red in the CPU usage bar in htop means that the time is being spent in the kernel, not that the time is being spent in “kernel threads”. This can mean that you’re spending a lot of time waiting on IO, that you’re spending a lot of time allocating memory, or similar activities that require some work to be done inside of the kernel.
Seeing as your processes are using 25GiB of memory each, I’d wager your code allocates a lot.
Also, I don’t think it’s 25G each. Most of the processes are threads, and use the same memory; the actual size is 9G, which is still a lot, but not unexpected as I’m dealing with large arrays.
There might be a problem with unwanted allocation though. I’m looking into it
It seems that the eigen function from LinearAlgebra causes this behavior, in fact it is easy to reproduce:
using LinearAlgebra
A = rand(800,800)
Threads.@threads for i in 1:30
eigen(A)
end
Just running this and my htop looks exactly like in the original post. Leaving out Threads.@threads makes no difference, except that not all threads ate at 100% simultaneously.
Any ideas why this is, and whether this is at all a bad thing? I can’t tell.
BLAS doesn’t use Julia’s threads. If you are using Julia’s multithreading, you likely want BLAS.set_num_threads(1) which will make all your BLAS operations single threaded.