Oh, I didn’t know that. Is there an easy way to check?
There’s a -p
option when you start Julia, is that what you mean? I didn’t think I needed this, since I set the environment variable. Am I misunderstanding this?
The docs say,
If the underlying BLAS is using multiple threads, higher flop rates are realized. The number of BLAS threads can be set with
BLAS.set_num_threads(n)
.If the keyword argument
parallel
is set totrue
,peakflops
is run in parallel on all the worker processors. The flop rate of the entire parallel computer is returned. When running in parallel, only 1 BLAS thread is used. The argumentn
still refers to the size of the problem that is solved on each processor.
It doesn’t make sense to me why parallel=true
would force single-threaded BLAS, but ok
That was without:
julia> BLAS.set_num_threads(16)
julia> LinearAlgebra.peakflops(16000)
3.5702446000519916e11
julia> BLAS.set_num_threads(32)
julia> LinearAlgebra.peakflops(16000)
3.293593157654745e11
Me too!
Correct on the first point. On the second, is that a general rule? Didn’t realize that. I’ve heard of people getting the best results with n-1 threads so one could still watch the mouse, etc. But I forget whether n was physical or logical in this case.
Just saw the additional details you both gave, that makes it much
clearer.
Good point, I haven’t even looked into overclocking