So, as expected, when printing, for each object passed to Python print the interpreter builds the entire string representation of that object and then writes it, effectively doing this:
The flow is roughly builtin_print[1] → PyFile_WriteObject[2] → … → PyObject_Repr[3] → list_repr[4] → long_to_decimal_string[5].
Step 4 returns the complete list converted to a string regardless of the type of objects in the list.
I do not know anything about Julia nor Python internals and proposing a fix is not in my capacity. I suppose that building the string in memory would raise other challenges. However, this problem would likely be solved by not writeing for each element but rather building the entire string representation as in Python.
A different use of printing to stdout/stderr is to pipe output through less (or similar pager) to be able to inspect the output interactively without having to do a full (long) run. E.g. grep for certain strings in the output to check a condition, then letting the program continue until the next matching line, etc. I find this very useful during development.
In this case you will also hit upon the slower output of Julia compared to Python. A simple (rough, due to the user keyboard interaction) test of printing one integer per line for 1 - 10,000,000 and piping through less then using > and q to jump to the end of the output to let the script finish:
paulm@cmstorm 09:38:/data/examples/julia$ cat print_integers.py
for i in range(1, 10000001):
print(i)
paulm@cmstorm 09:38:/data/examples/julia$ cat print_integers.jl
for i in 1 : 10000000
println(i)
end
paulm@cmstorm 09:38:/data/examples/julia$ time python print_integers.py | less
<press ">" followed by "q" and wait for script to finish>
real 0m3.474s
user 0m3.723s
sys 0m0.080s
paulm@cmstorm 09:38:/data/examples/julia$ time julia -O3 print_integers.jl | less
<press ">" followed by "q" and wait for script to finish>
real 0m53.254s
user 0m39.575s
sys 0m47.758s
This is actually quite an eye-opener, to see the insane number of syscalls Julia does compared to Python:
paulm@cmstorm 09:43:/data/examples/julia$ strace -c -o out.python python print_integers.py | less
<press ">" followed by "q" and wait for script to finish>
paulm@cmstorm 09:43:/data/examples/julia$ strace -c -o out.julia julia -O3 print_integers.jl | less
<press ">" followed by "q" and wait for script to finish>
paulm@cmstorm 09:54:/data/examples/julia$ grep write out.python
18.68 0.001243 0 9626 write
paulm@cmstorm 09:54:/data/examples/julia$ grep wait out.python
paulm@cmstorm 09:54:/data/examples/julia$ grep write out.julia
41.49 49.752976 2 20000003 write
0.00 0.000010 0 13 pwrite64
paulm@cmstorm 09:55:/data/examples/julia$ grep wait out.julia
58.49 70.127743 1 40000001 epoll_pwait
I get that too on Julia 1.5.1, but on my recent Julia master (6 days old) I get a regression:
julia> @time println("1")
1
0.000082 seconds (10 allocations: 240 bytes)
what happens behind the scenes is (while with fewer allocations for some reason):
julia> @time print(stdout, "1", '\n')
1
0.000053 seconds (8 allocations: 208 bytes)
but even better is (for allocations, time is worse there but probably misleading when GC kicks in it should be better):
julia> @time print(stdout, "1", "\n") # both should work fast, but I'm a bit surprised this isn't the slower version
1
0.000066 seconds (4 allocations: 96 bytes)
both are worse than (almost ideal, not sure no allocations possible):
julia> @time print("1\n")
1
0.000054 seconds (2 allocations: 48 bytes)