Running Julia on 1 thread on a 4 Core Windows 10 PC, the line
rand(10_000,10_000)^2
Uses 100% of the CPUs.
From this, I assume that unless you are very advanced, there’s no benefit to writing multi-threaded code as the compiler will do a better job of optimizing code across all the cores.
Julia automatically only automatically multithreads matmul and factorization (in the future we might multi-thread broadcasting also, but currently do not). If you rely a lot on linear algebra, you may want to install MKL which will make all of your linear algebra faster (it can’t be included by default for license reasons).
To clarify and to avoid a potential misunderstanding here.
Your matrix multiplication is running multithreaded.
It’s not that the compiler is super smart here (in the sense of auto-parallelizing the matmul) but just that someone else wrote the multithreaded code for you already.
Specifically, the matmul code that runs isn’t in Julia but provided through OpenBLAS (or MKL, if you load it), an external dependency that Julia uses for most linear algebra functionality.
As a consequence of 3), one must distinguish between Julia threads and OpenBLAS threads. Even if you run Julia with a single thread, i.e. julia -t 1, OpenBLAS, and thus your matmul, will run on multiple OpenBLAS threads. (OPENBLAS_NUM_THREADS is your friend to control the latter).