It seems like Core.stdout has a bug (memory leak?).
It’s up to at least 15x slower.
$ hyperfine "julia +1.12 --project=. -e 'for _ in 1:20000000; print(\"Hello\"); end;\' >/dev/null"
Benchmark 1: julia +1.12 --project=. -e 'for _ in 1:20000000; print("Hello"); end;\' >/dev/null
Time (mean ± σ): 1.514 s ± 0.049 s [User: 2.324 s, System: 0.086 s]
Range (min … max): 1.446 s … 1.601 s 10 runs
$ hyperfine "julia +1.12 --project=. -e 'for _ in 1:20000000; print(Core.stdout, \"Hello\"); end;\' >/dev/null"
Benchmark 1: julia +1.12 --project=. -e 'for _ in 1:20000000; print(Core.stdout, "Hello"); end;\' >/dev/null
⠦ Current estimate: 15.078 s
Only 5x 50% slower here, since it gets worse with higher loop count:
$ hyperfine "julia +1.12 --project=. -e 'for _ in 1:200000; print(Core.stdout, \"Hello\"); end;\' >/dev/null"
Benchmark 1: julia +1.12 --project=. -e 'for _ in 1:200000; print(Core.stdout, "Hello"); end;\' >/dev/null
Time (mean ± σ): 422.7 ms ± 21.3 ms [User: 1186.3 ms, System: 132.4 ms]
Range (min … max): 401.9 ms … 463.4 ms 10 runs
$ hyperfine "julia +1.12 --project=. -e 'for _ in 1:200000; print(\"Hello\"); end;\' >/dev/null"
Benchmark 1: julia +1.12 --project=. -e 'for _ in 1:200000; print("Hello"); end;\' >/dev/null
Time (mean ± σ): 275.9 ms ± 19.3 ms [User: 597.3 ms, System: 90.5 ms]
Range (min … max): 254.6 ms … 306.3 ms 10 runs
EDIT: Note at low loop count the difference is misleading, I need subtract startup to get 5x.
EDIT: So likely wrong conclusion: For loop count 2, Core is slightly faster, or rather “same”, I’m mostly measuring noise and/or startup cost.