Are the multiple core results for multiple Julia threads or multiple BLAS threads?
Thereâs gonna be a change in 1.9. julia/NEWS.md at v1.9.0-alpha1 ¡ JuliaLang/julia ¡ GitHub
Somewhere I believe I read it was going to default to 1, but I may be mistaken.
It wonât default 1, see the last point in the NEWS.md that youâve linked. It will default to half the number of CPU threads on most architectures (because of hyperthreading).
As for Stefanâs question, I donât think there is a single ârightâ choice here. The relation between Julia threads and BLAS threads is complicated. So, Iâd argue that one should generally think about Julia and BLAS threads separately and shouldnât link them to each other. If a user starts julia as julia -t 1
, the user clearly indicates that she wants a single Julia thread. Does she also indicate that she wants all dependent libraries to run single-threaded? Maybe, but not certainly. Iâd say that one very often writes serial code but is more than happy to get some âfree parallelismâ through BLAS. Of course, we could make julia
default to parallel BLAS and julia -t 1
to single-threaded BLAS. But that seems kind of strange as well. Again, I donât think there is a ârightâ choice. If you want Julia and all libraries to run single-threaded, there is only one safe way: set all the relevant environment variables.
Theyâre for multiple Julia and BLAS threads.
By the way, if you are interested in Gram-Schmidt algorithms, I strongly recommend having a look to the randomized Gram-Schmidt (RGS) algorithm proposed by Balabanov and Grigori (2022). It can yield as much as 4X speedup compared to CGS2. I can share Julia and C codes if youâre interested. Note that RGS can be carried over to derive randomized Arnoldi and GMRES algorithms with significant speedups.
Balabanov, Oleg, and Laura Grigori. âRandomized GramâSchmidt Process with Application to GMRES.â SIAM Journal on Scientific Computing 44.3 (2022): A1450-A1474.
I believe you can use tools from outside of Julia. I.e. I thought cgroups in Linux, but looking stuff on it up, it seemed quite involved.
To get you started, what I found so far:
Linux has a limit on the number of threads. The threads-max kernel parameter can be set to ensure that the number of threads per process is always less than or equal to that limit.
https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#CPUQuota=
AllowedCPUs=, StartupAllowedCPUs=
Restrict processes to be executed on specific CPUs.
You can also set all kinds of limits (such as max stack space or file handles) with ulimit.
ulimit -T 1
Didnât work for me despite it documented to do what you want⌠Note, itâs a bash built-in, and also documented there, not just with my ulimit --help
-T the maximum number of threads
while e.g. this worked:
-t the maximum amount of cpu time in seconds
ulimit -t 1
$ julia
Segmentation fault (core dumped)
You of course donât want your program limited in that way, you want the system to lie to your process that you are running on a single threaded/CPU system.
Yes, I meant to say that I believe that I read somewhere that it will default to one, but according to the current 1.9 release notes thatâs clearly not going to happen and I may simply have misremembered.
I completely agree with the rest. I have found that when running multithreaded code where each thread is doing significant linear algebra, using only one BLAS thread per Julia thread seems to be optimal. But users should set that manually because their programs may be structured such that linear algebra is done in the single-threaded portions, and multi-threading is used for non-linear-algebra purposes.
Is this a thread where you can set this response as an answer? Thereâs a ton of problem-specific stuff in all of the prior responses, but this solution seems like a subtle and general problem that a lot of folks might run into, so it would be great if it was more discoverable.
If you can set it as the answer, a link to it will show up in the first post.