Oh, I didn’t know that. Is there an easy way to check?
There’s a -p option when you start Julia, is that what you mean? I didn’t think I needed this, since I set the environment variable. Am I misunderstanding this?
The docs say,
If the underlying BLAS is using multiple threads, higher flop rates are realized. The number of BLAS threads can be set with
BLAS.set_num_threads(n).If the keyword argument
parallelis set totrue,peakflopsis run in parallel on all the worker processors. The flop rate of the entire parallel computer is returned. When running in parallel, only 1 BLAS thread is used. The argumentnstill refers to the size of the problem that is solved on each processor.
It doesn’t make sense to me why parallel=true would force single-threaded BLAS, but ok 
That was without:
 julia> BLAS.set_num_threads(16)
julia> LinearAlgebra.peakflops(16000)
3.5702446000519916e11
julia> BLAS.set_num_threads(32)
julia> LinearAlgebra.peakflops(16000)
3.293593157654745e11
Me too!
Correct on the first point. On the second, is that a general rule? Didn’t realize that. I’ve heard of people getting the best results with n-1 threads so one could still watch the mouse, etc. But I forget whether n was physical or logical in this case.
Just saw the additional details you both gave, that makes it much
clearer.
Good point, I haven’t even looked into overclocking

 davidbp:
 davidbp: RoyiAvital:
 RoyiAvital:


 Elrod:
 Elrod: