Parallel processing using FLoops

Oh I see, thank you!

I tried with floop and setting BLAS.set_num_threads(1), but with the top command I can see that %CPU \approx 100. However, without floop and BLAS.set_num_threads(96), since the computer has 48 cores, I get %CPU \approx 3189. I am not sure but I think while using floop I need to specify the number of threads, which I don’t know how to do?

Did you start your Julia with more than one thread? The experience I made is that BLAS will use many threads no matter what you set for Julia, but for FLoops to be able to use more than one thread, you need to start Julia with more than one thread:

julia -t <num_threads>

If you have 48 cores try setting the threads to something like 40 and see what that does and go from there.

Actually, I am running the code on a remote server. I need to submit the .jl file through a bash script. Is it possible to use julia -t <num_threads> inside my .jl file?

Not that I know of. But if you need to submit it via a bash script, you should be able to set the number of threads as a variable. Something like

export JULIA_NUM_THREADS=<num_threads>

might work if you put it somewhere at the beginning of your bash script. This was taken from here:

https://docs.julialang.org/en/v1/manual/multi-threading/

Edit:

I also imagine that you probably call your Julia script from the bash script in the form of

julia <script>.jl

If this is the case, then you should also be able to just add the -t flag to that call:

julia -t <num_threads> <script>.jl
1 Like

Both ways work! Although I was wondering about this comment:

If I have 48 cores, and I have 2 threads per core, shouldn’t I be able to set it to around 96?
Actually, I tried it but it gave me a segmentation fault.

Generally for code that is computation limited like this is (due to the heavy use of BLAS) the each core is already maxed out by a single thread and so the second thread each core has available provides no benefit and may instead hurt performance to “overbook” your CPU cores. As such it is usually better to have only 1 thread per core (and sometimes leave a few cores to handle other tasks, particularly when you have so many).

I have no idea about the segmentation fault.

1 Like

Alright! Thank you for the clarification.