Regular println vs Core.stdout

It seems like Core.stdout has a bug (memory leak?).

It’s up to at least 15x slower.

$ hyperfine "julia +1.12 --project=. -e 'for _ in 1:20000000; print(\"Hello\"); end;\' >/dev/null"
Benchmark 1: julia +1.12 --project=. -e 'for _ in 1:20000000; print("Hello"); end;\' >/dev/null
  Time (mean ± σ):      1.514 s ±  0.049 s    [User: 2.324 s, System: 0.086 s]
  Range (min … max):    1.446 s …  1.601 s    10 runs
 
$ hyperfine "julia +1.12 --project=. -e 'for _ in 1:20000000; print(Core.stdout, \"Hello\"); end;\' >/dev/null"
Benchmark 1: julia +1.12 --project=. -e 'for _ in 1:20000000; print(Core.stdout, "Hello"); end;\' >/dev/null
 ⠦ Current estimate: 15.078 s

Only 5x 50% slower here, since it gets worse with higher loop count:

$ hyperfine "julia +1.12 --project=. -e 'for _ in 1:200000; print(Core.stdout, \"Hello\"); end;\' >/dev/null"
Benchmark 1: julia +1.12 --project=. -e 'for _ in 1:200000; print(Core.stdout, "Hello"); end;\' >/dev/null
  Time (mean ± σ):     422.7 ms ±  21.3 ms    [User: 1186.3 ms, System: 132.4 ms]
  Range (min … max):   401.9 ms … 463.4 ms    10 runs

$ hyperfine "julia +1.12 --project=. -e 'for _ in 1:200000; print(\"Hello\"); end;\' >/dev/null"
Benchmark 1: julia +1.12 --project=. -e 'for _ in 1:200000; print("Hello"); end;\' >/dev/null
  Time (mean ± σ):     275.9 ms ±  19.3 ms    [User: 597.3 ms, System: 90.5 ms]
  Range (min … max):   254.6 ms … 306.3 ms    10 runs

EDIT: Note at low loop count the difference is misleading, I need subtract startup to get 5x.

EDIT: So likely wrong conclusion: For loop count 2, Core is slightly faster, or rather “same”, I’m mostly measuring noise and/or startup cost.