[Regression in rc3] Can you decline threads you've been assigned, e.g. GC threads? Or should it be possible?

Can anyone first confirm my 2.2% slowdown, on 1.11 rc3:

$ time julia +1.11 -O3 -t1 --gcthreads=4 pidigits.jl 10000 >/dev/null

real	0m1,205s
user	0m2,165s
sys	0m0,072s

$ time julia -O3 -t1 --gcthreads=4 pidigits.jl 10000 >/dev/null

real	0m1,178s
user	0m1,718s
sys	0m0,458s

and explain why user is higher (it seems though it plus sys isn’t much changed, but should be closer to user?)? I think I do by below.

I had noticed at the Benchmark Game:

Julia #3 2.06 169,012 506 2.55 99% 6% 13% 6%

And “busy” is 23% slower than the sec. column. While nobody is competing on “busy” I think, I believe it’s explained by the 3 “unused” cores giving those “6% 13% 6%”. But that’s a single-threaded program and those should ideally be 0% each or close to (the ideal nobody gets, or may not be able to get, see my other theory):

What I think could be happening is that the GC is running on those other cores and/or that the one you use overheats, and/or the program moved to another core by the CPU, and thus to a different L1 cache… as for the fastest C program “67% 0% 2% 33%”.

The Benchmark Game insists on all the same setting for all the program, multi-threaded or not, i.e. -t 4, so I’m thinking when I know it’s not needed, could I add at the top of my script GC_cores(1), when I know the program is actually single-threaded. I’m not sure it’s wise, since hypothetically more GC threads should help then maybe GC_cores(1, 4) and if the latter number is omitted then keep the default GC threads?

I tried to limit load on my machine (suspending firefox), and hope my timing isn’t off, also good to know if Julia is less immune to a loaded machine.

$ hyperfine 'julia +1.11 -O3 -t4 pidigits.jl 10000 >/dev/null'
Benchmark 1: julia +1.11 -O3 -t4 pidigits.jl 10000 >/dev/null
  Time (mean ± σ):      1.249 s ±  0.034 s    [User: 2.159 s, System: 0.071 s]
  Range (min … max):    1.203 s …  1.315 s    10 runs

$ hyperfine 'julia -O3 -t4 pidigits.jl 10000 >/dev/null'
Benchmark 1: julia -O3 -t4 pidigits.jl 10000 >/dev/null
  Time (mean ± σ):      1.230 s ±  0.031 s    [User: 1.676 s, System: 0.503 s]
  Range (min … max):    1.189 s …  1.284 s    10 runs


$ time julia -O3 -t1 --gcthreads=2 pidigits.jl 10000 >/dev/null

real	0m1,215s
user	0m1,651s
sys	0m0,508s

What is pidigits.jl?

It’s linked above (the “#3” code, in case you missed it), i.e. pidigits Julia #3 program | Q6600 Benchmarks Game

[FYI: This would help with that benchmark (and the whole PR with mandelbrot too), but otherwise unrelated to the question here: Add misc precompiles (for benchmarks) by PallHaraldsson · Pull Request #55784 · JuliaLang/julia · GitHub ]