Back to the Banana Pi BPI-F3:
julia> using LinearAlgebra
julia> BLAS.set_num_threads(1)
julia> BLAS.get_num_threads() # check the setting for good measure
1
julia> @time LinearAlgebra.peakflops(4096; ntrials=3)
192.494952 seconds (33 allocations: 768.024 MiB, 0.18% gc time)
2.15950419971436e9
It’s 3.3x slower, ~the CPU has 8 cores though~ OpenBLAS was using 4 threads above, so there’s indeed a bit of degradation (this system also has 2 cores always busy by the root
user, so scaling up to all the 8 cores is basically never possible)