Can anyone first confirm my 2.2% slowdown, on 1.11 rc3:
$ time julia +1.11 -O3 -t1 --gcthreads=4 pidigits.jl 10000 >/dev/null
real 0m1,205s
user 0m2,165s
sys 0m0,072s
$ time julia -O3 -t1 --gcthreads=4 pidigits.jl 10000 >/dev/null
real 0m1,178s
user 0m1,718s
sys 0m0,458s
and explain why user
is higher (it seems though it plus sys isn’t much changed, but should be closer to user?)? I think I do by below.
I had noticed at the Benchmark Game:
Julia #3 2.06 169,012 506 2.55 99% 6% 13% 6%
And “busy” is 23% slower than the sec. column. While nobody is competing on “busy” I think, I believe it’s explained by the 3 “unused” cores giving those “6% 13% 6%”. But that’s a single-threaded program and those should ideally be 0% each or close to (the ideal nobody gets, or may not be able to get, see my other theory):
What I think could be happening is that the GC is running on those other cores and/or that the one you use overheats, and/or the program moved to another core by the CPU, and thus to a different L1 cache… as for the fastest C program “67% 0% 2% 33%”.
The Benchmark Game insists on all the same setting for all the program, multi-threaded or not, i.e. -t 4
, so I’m thinking when I know it’s not needed, could I add at the top of my script GC_cores(1)
, when I know the program is actually single-threaded. I’m not sure it’s wise, since hypothetically more GC threads should help then maybe GC_cores(1, 4)
and if the latter number is omitted then keep the default GC threads?
I tried to limit load on my machine (suspending firefox), and hope my timing isn’t off, also good to know if Julia is less immune to a loaded machine.
$ hyperfine 'julia +1.11 -O3 -t4 pidigits.jl 10000 >/dev/null'
Benchmark 1: julia +1.11 -O3 -t4 pidigits.jl 10000 >/dev/null
Time (mean ± σ): 1.249 s ± 0.034 s [User: 2.159 s, System: 0.071 s]
Range (min … max): 1.203 s … 1.315 s 10 runs
$ hyperfine 'julia -O3 -t4 pidigits.jl 10000 >/dev/null'
Benchmark 1: julia -O3 -t4 pidigits.jl 10000 >/dev/null
Time (mean ± σ): 1.230 s ± 0.031 s [User: 1.676 s, System: 0.503 s]
Range (min … max): 1.189 s … 1.284 s 10 runs
$ time julia -O3 -t1 --gcthreads=2 pidigits.jl 10000 >/dev/null
real 0m1,215s
user 0m1,651s
sys 0m0,508s