Why is printing to a terminal slow?

asmar · July 14, 2020, 9:00am

So, as expected, when printing, for each object passed to Python print the interpreter builds the entire string representation of that object and then writes it, effectively doing this:

The flow is roughly builtin_print[1] → PyFile_WriteObject[2] → … → PyObject_Repr[3] → list_repr[4] → long_to_decimal_string[5].

Step 4 returns the complete list converted to a string regardless of the type of objects in the list.

[1] https://github.com/python/cpython/blob/v3.8.4/Python/bltinmodule.c#L1883
[2] https://github.com/python/cpython/blob/v3.8.4/Objects/fileobject.c#L131
[3] https://github.com/python/cpython/blob/v3.8.4/Objects/object.c#L539
[4] https://github.com/python/cpython/blob/v3.8.4/Objects/listobject.c#L415
[5] https://github.com/python/cpython/blob/v3.8.4/Objects/longobject.c#L1850-L1871

I do not know anything about Julia nor Python internals and proposing a fix is not in my capacity. I suppose that building the string in memory would raise other challenges. However, this problem would likely be solved by not writeing for each element but rather building the entire string representation as in Python.

paulmelis · October 29, 2020, 8:56am

A different use of printing to stdout/stderr is to pipe output through less (or similar pager) to be able to inspect the output interactively without having to do a full (long) run. E.g. grep for certain strings in the output to check a condition, then letting the program continue until the next matching line, etc. I find this very useful during development.

In this case you will also hit upon the slower output of Julia compared to Python. A simple (rough, due to the user keyboard interaction) test of printing one integer per line for 1 - 10,000,000 and piping through less then using > and q to jump to the end of the output to let the script finish:

paulm@cmstorm 09:38:/data/examples/julia$ cat print_integers.py 
for i in range(1, 10000001):
    print(i)

paulm@cmstorm 09:38:/data/examples/julia$ cat print_integers.jl 
for i in 1 : 10000000
    println(i)
end

paulm@cmstorm 09:38:/data/examples/julia$ time python print_integers.py | less
<press ">" followed by "q" and wait for script to finish>

real	0m3.474s
user	0m3.723s
sys	0m0.080s

paulm@cmstorm 09:38:/data/examples/julia$ time julia -O3 print_integers.jl | less
<press ">" followed by "q" and wait for script to finish>

real	0m53.254s
user	0m39.575s
sys	0m47.758s

This is actually quite an eye-opener, to see the insane number of syscalls Julia does compared to Python:

paulm@cmstorm 09:43:/data/examples/julia$ strace -c -o out.python python print_integers.py | less
<press ">" followed by "q" and wait for script to finish>

paulm@cmstorm 09:43:/data/examples/julia$ strace -c -o out.julia julia -O3 print_integers.jl | less
<press ">" followed by "q" and wait for script to finish>

paulm@cmstorm 09:54:/data/examples/julia$ grep write out.python 
 18.68    0.001243           0      9626           write
paulm@cmstorm 09:54:/data/examples/julia$ grep wait out.python 
paulm@cmstorm 09:54:/data/examples/julia$ grep write out.julia 
 41.49   49.752976           2  20000003           write
  0.00    0.000010           0        13           pwrite64
paulm@cmstorm 09:55:/data/examples/julia$ grep wait out.julia 
 58.49   70.127743           1  40000001           epoll_pwait

Omicron666 · October 29, 2020, 11:18am

println slower than print in julia, for same loop x XXX
in python you are using print, so it might be hard to compare directly!

if i join a first i get x100 performance factor

paulmelis · October 29, 2020, 11:40am

What is your point? print in Python outputs the line to stdout followed by a newline, doesn’t println in Julia do the same?

That’s not very useful as that won’t work in general for a program where you have print’s (or log calls) scattered all over the place

Omicron666 · October 29, 2020, 11:53am

ah ok

well, yes, there must be sub-optimal in all case

cserteGT3 · October 29, 2020, 12:19pm

I can’t really contribute to the discussion, just wanted to show this benchmark:

julia> using BenchmarkTools

julia> @benchmark println($"1")
# printing a ton
BenchmarkTools.Trial:
  memory estimate:  144 bytes
  allocs estimate:  6
  --------------
  minimum time:     123.100 μs (0.00% GC)
  median time:      134.999 μs (0.00% GC)
  mean time:        233.607 μs (0.00% GC)
  maximum time:     10.268 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark print($"1\n")
# printing a ton
BenchmarkTools.Trial:
  memory estimate:  48 bytes
  allocs estimate:  2
  --------------
  minimum time:     69.100 μs (0.00% GC)
  median time:      76.000 μs (0.00% GC)
  mean time:        122.459 μs (0.00% GC)
  maximum time:     9.036 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Palli · October 29, 2020, 1:21pm

I get that too on Julia 1.5.1, but on my recent Julia master (6 days old) I get a regression:

julia> @time println("1")
1
  0.000082 seconds (10 allocations: 240 bytes)

what happens behind the scenes is (while with fewer allocations for some reason):

julia> @time print(stdout, "1", '\n')
1
  0.000053 seconds (8 allocations: 208 bytes)

but even better is (for allocations, time is worse there but probably misleading when GC kicks in it should be better):

julia> @time print(stdout, "1", "\n")  # both should work fast, but I'm a bit surprised this isn't the slower version
1
  0.000066 seconds (4 allocations: 96 bytes)

both are worse than (almost ideal, not sure no allocations possible):

julia> @time print("1\n")
1
  0.000054 seconds (2 allocations: 48 bytes)

github.com/JuliaLang/julia

Fewer allocations for print

JuliaLang:master ← PallHaraldsson:patch-2

opened 02:00PM - 29 Oct 20 UTC

PallHaraldsson

+1 -1

A. Fewer allocations (faster printing? [in terminal]). <s>B. It's generally g…ood to reuse code, there print (see the lines immediately above), but that print is doing show, and nothing else except locking and I assume you don't need to do twice or n-times actually.</s> Surprisingly, ``` print(io, xs..., \n') ``` has more allocations ('\n' allocation issue can maybe be fixed separately. I'm not sure, but it seems to be related to if type of that and xs do not match). Not solved here, print('1') allocates more than print('1'). https://discourse.julialang.org/t/why-is-printing-to-a-terminal-slow/42987/27?u=palli

asmar · October 29, 2020, 3:40pm

Well, yes, I already demonstrated joining the array, please read the thread.

mgkuhn · November 24, 2021, 4:25pm

See also https://github.com/JuliaLang/julia/issues/43176

Topic		Replies	Views
Julia slower than Python to sort and reverse a list of integers Performance	40	2723	April 28, 2023
Help to get my slow Julia code to run as fast as Rust/Java/Lisp Performance	100	4925	August 6, 2021
Quite bad performance of Julia 0.6.4 vs Python+Numpy General Usage	26	5331	November 13, 2018
Why Julia is fast in interpreter but slow when dealing with files Performance	11	6041	March 1, 2018
General questions from Python user Performance	59	4462	March 8, 2021

Why is printing to a terminal slow?

Related topics