IJulia & PyCall: output buffering

aplavin · July 16, 2019, 6:14pm

I noticed that all output from PyCall is buffered in IJulia till the cell finishes evaluating, which is often extremely inconvenient: one cannot easily track progress of a python computation, if it uses stdout for that. A very simple example:

using PyCall
for i in 1:10
    pyimport("__builtin__").print(i)
    sleep(0.5)
end

If you run this in plain Julia REPL, then it prints numbers each 0.5 seconds as expected. But running the same code as a cell in Jupyter Lab gives all output at once when it completes. In contrast, native Julia printing works fine in both REPL and Jupyter environments:

for i in 1:10
    println(i)
    sleep(0.5)
end

Do you think this should be considered a bug in PyCall or IJulia? And are there any ways to disable this aggressive buffering?

stevengj · July 16, 2019, 8:06pm

IJulia has a task (“green thread”) checking for output every time the main task yields (e.g. with your sleep call). But it has no control over how the operating system does buffering for pipes, short of manually calling flush.

(Separately, if you have a long-running Python function, it presumably won’t yield at all to Julia’s scheduler, in which case IJulia will have no chance to check for output until the Python function completes.)

tkf · July 16, 2019, 10:25pm

PyCall can in principle patch sys.stdout etc. like ipykernel does, so that writing to it can invoke Julia’s IO. But calling sys.stdout.flush (or pass flush=True to print in Python 3) in your script is a more robust solution as it would work in other contexts like piping into a file.

aplavin · July 17, 2019, 12:05am

Unfortunately, explicit flush is not an option in my case. First, the python code is developed by other people and I wouldn’t like to modify it; second, part of the output I’m interested in actually comes from another program, which gets called from Python code. A simplified example of the structure:
pyimport("os").system("parallel 'sleep {}; echo {}' ::: 1 2 3 4 5")
This outputs all 5 numbers after 5 seconds, and what I need is to print them as soon as they are ready.

aplavin · July 17, 2019, 12:15am

So you mean it’s impossible for IJulia to print output from other languages as soon as it is, well, printed by the called program? This seems like a major limitation, because there is a lot of Python (and R, for that matter) code out there containing longish-running procedures, and it’s very useful to have output as soon as it’s ready.

And by the way, IJulia correctly outputs lines one by one when I run an external program like run(`parallel 'sleep {}; echo {}' ::: 1 2 3 4 5`).

tkf · July 17, 2019, 12:38am

You can call the long-running procedure in Python’s background thread and poll in the main thread while calling sys.stdout.flush and Julia’s sleep occasionally.

stevengj · July 17, 2019, 8:14pm

This is a general limitation of how stdio interacts with OS pipes. The only ways around it, as far as I know, are to (i) modify the program to call flush (perhaps in a separate thread as @tkf suggested), (ii) hook into the language’s I/O (e.g. patching Python sys.stdout), or (iii) setting up a pseudoterminal instead of a pipe (since stdio to pseudoterminals is line-buffered, which is what you want). Pseudoterminals are a Unix-only thing, however, and seem like a lot of pain to work with even there.

In comparison, IPython does not capture non-Python stdout at all: Capture output coming from C and C++ libraries · Issue #110 · ipython/ipykernel · GitHub and OS-level stdout/stderr capture · Issue #1230 · ipython/ipython · GitHub … there is an external package that works on Unix (GitHub - minrk/wurlitzer: Capture C-level stdout/stderr in Python) but as I understand it the output is not captured until the cell finishes executing.

aplavin · July 17, 2019, 10:46pm

Interesting, so IPython basically does not do this better. Didn’t think this is so complicated of an issue.

tkf · July 17, 2019, 10:47pm

FYI, here is a MWE

using PyCall

py"""
import sys
import time
import threading


def printing(n):
    for i in range(n):
        time.sleep(1)
        print(f"i={i}")


th = threading.Thread(target=printing, args=(3,))
th.start()
"""

while py"th.is_alive()"
    py"sys.stdout.flush()"
    sleep(1)
end
py"sys.stdout.flush()"

By the way, you can’t swap main and background threads because calling Julia functions from different thread is not safe. So, it has to be done this way. But it won’t be hard to wrap this in a function.

aplavin · July 17, 2019, 10:53pm

It’s nice that your example works for non-python output of programs called from python as well. E.g. if I replace printing function body with os.system("parallel 'sleep {}; echo {}' ::: 1 2 3 4 5") it correctly prints numbers one by one.

One thing I don’t get is why run(`parallel 'sleep {}; echo {}' ::: 1 2 3 4 5`) correctly outputs numbers without buffering? Does this use a completely different mechanism than executing python and running external programs from that python code?

tkf · July 17, 2019, 11:37pm

Kind of repeating what @stevengj already explained, but this is because, while calling foreign functions like PyCall does, Julia stops processing I/O events including writes to stdout. If you are interested in why Julia works that way, maybe have a look at Tasks (aka Coroutines) section in the manual and pick some keywords to google from there.

bzr0014 · July 22, 2021, 4:10pm

I’ve tried the following in IJulia and it seems to be a good workaround:
(updated based on @stevengj’s reply)

 # python julia ijulia fix output buffer problem
using PyCall
sys = pyimport("sys")
sys.stdout = stdout
sys.stderr = stderr

stevengj · July 22, 2021, 4:50pm

This shouldn’t be necessary. You can just do sys.stdout = stdout in Julia.

Topic		Replies	Views
PyCall stdout buffer processing not doing what I expected Performance pycall	10	1007	January 23, 2022
Progress bar, or printing during execution, in Jupyter/IJulia? General Usage question , jupyter , ijulia , functions	9	3069	November 8, 2022
Print statements in pyjulia General Usage	5	1474	December 21, 2018
Python stdout from PyCall New to Julia package , pycall	2	402	January 22, 2022
Showing the progress of a calculation New to Julia	3	779	August 25, 2021

IJulia & PyCall: output buffering

Related topics