A.
Using 4 threads adds 20.3 ms (37 on average, but I trust the minimum better) to startup (would there be a way for Julia do delay adding them until the threads actually used?).
The obvious solution is not asking for 4 threads (in that case a single threaded program), but the guy behind the Benchmark Game declined, wants the same settings for all programs, multi-threaded or not (the solution might be to start with only 1, and ask all to add 4 threads; is there a way to easily add as many as “-tauto” does from within the program?).
$ hyperfine 'julia -t4 -O0 --cpu-target=core2 --startup-file=no pidigits.jl 1000 >/dev/null'
Benchmark #1: julia -t4 -O0 --cpu-target=core2 --startup-file=no pidigits.jl 1000 >/dev/null
Time (mean ± σ): 307.6 ms ± 25.6 ms [User: 593.6 ms, System: 393.8 ms]
Range (min … max): 270.0 ms … 341.5 ms 10 runs
$ hyperfine 'julia -O0 --cpu-target=core2 --startup-file=no pidigits.jl 1000 >/dev/null'
Benchmark #1: julia -O0 --cpu-target=core2 --startup-file=no pidigits.jl 1000 >/dev/null
Time (mean ± σ): 270.5 ms ± 24.9 ms [User: 536.8 ms, System: 366.7 ms]
Range (min … max): 249.7 ms … 317.8 ms 10 runs
Go language starts with as many procs as number of cores by default, but has, what their Pidigits version of the program uses:
runtime.GOMAXPROCS(1)
The cost for us seems to be only for startup, so can’t get that time back, but anyone know if there’s a reason to lower threads afterward (does it help with GC? I think that might be the reason for Go).
B.
Chapel uses “yield”, and while I’m not better for performance, despite their program fastest:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/pidigits-chapel-4.html
Is it worth looking into (as you can’t use a package):
https://github.com/BenLauwens/ResumableFunctions.jl
$ time julia -O2 --startup-file=no pidigits.jl 10000 >/dev/null
real 0m1,233s
user 0m1,628s
sys 0m0,477s
julia> GC.gc(); GC.gc(); GC.gc(); GC.gc(); GC.enable(false); @benchmark (GC.enable(false); pidigits(10000, devnull); GC.enable(true);) gcsample=true
BenchmarkTools.Trial:
memory estimate: 859.20 MiB
allocs estimate: 75584
--------------
minimum time: 920.721 ms (0.00% GC)
median time: 930.629 ms (0.00% GC)
mean time: 930.308 ms (0.00% GC)
maximum time: 939.252 ms (0.00% GC)
--------------
samples: 4
evals/sample: 1