Set flushing mode for output stream

ryofurue · August 7, 2023, 5:38am

Is there a way either 1) to specify the flushing mode of an existing output stream or 2) to create a new stream that has the desired flushing mode? [Edit: the output stream needs to be connected to the “screen” (console) and directable to a text file.]

I’m still suffering from the fact that each time I want to print something immediately, I need to add flush(stdout) after the print statement:

https://discourse.julialang.org/t/stderr-not-flushed-right-away

I could use the @info facility in some cases, but I sometimes need to report the progress as

  for i in 1:imax
     print("$(i), "); flush(stdout)
     . . . something that takes time . . .
  end
  println() # end the line

That is, I need the output line look like "1, 2, 3, . . . " without newlines.

Palli · August 8, 2023, 4:24pm

You can do:

function println(io::IO, xs...)
   print(io, xs..., '\n')
   flush(io)
end

ans similarly for print (and others, if you really want that). That’s from my PR, since from when I thought this would be a good idea by default…:

https://github.com/JuliaLang/julia/pull/50718/files

I realized we actually need to have this in the hands of the users, same or similar to -u in Python, rather than the programmers hands. I mean it would still be ok to change the default to unbuffered by default (just slower, would be considered a non-breaking performance regression), but you would still then need a way to opt into buffered (the current default), with something like -b.

I actually thought stderr would already always be unbuffered… My change would actually change all streams, and for “printing” to files, and you might want to control this only for stdio (and stderr?).

ryofurue · August 8, 2023, 5:53pm

I think the best way is to

Define default buffering for stdout and stderr. On Linux, stdout is line-buffered and stderr is unbuffered (I haven’t verified this, though). I think these are excellent defaults.
Give the programmer the ability of setting the buffering behavior of a new stream.
Give the programmer the ability of changing the buffering mode of any stream.

I’m afraid I don’t think that flushing in println() is the way to go. We expect maximal efficiency when printing into a fully buffered stream, but your redefinition of println would degrade the performance.

Palli · August 8, 2023, 6:24pm

This is not possible (I believe currently, from within Julia, with a Julia API, though see below for a hack).

Right. I didn’t mean necessarily that you would change our Julia (or make my PR get accepted). I meant you could redefine in the actual program you are using. Or I think that actually works. I didn’t test it, let’s say if you call a precompiled package that does println. Would it use the old println, or get recompiled (I think the latter).

github.com/JuliaLang/julia

STDOUT/STDERR updates rate

opened 06:12AM - 10 Sep 15 UTC

alyst

I/O

When 0.4-dev Julia script is submitted to a job manager (SGE), and STDERR/STDLOG… is redirected to a file, this log file is updated very infrequently, approx. once in 3 hours. Is there a way to increase the updates rate? The script looks like this: ``` bash #!/bin/sh #$ -pe openmp 16 #$ -R y #$ -l h_rt=11:00:00 #$ -l h_vmem=4G #$ -S /bin/bash #$ -m ae #$ -j y #$ -o <...>/$JOB_NAME_$JOB_ID.log <...> export NWORKERS=$(($NSLOTS-1)) julia --color=no -q -p $NWORKERS <...>/run.jl <...> ```

I have been partially successful using the following hack:

c_stdout = cglobal(:stdout, Libc.FILE)
c_stdout_stream = unsafe_load(c_stdout).ptr
ccall(:setbuf, Cvoid, (Ptr{Cvoid}, Ptr{UInt8}), c_stdout_stream, Ptr{UInt8}(0))
which is essentially me trying to call setbuf(stdout, 0) from Julia. It works for some parts of my code, but I’m still experiencing output not printed in chronological order in the example below

Note setbuf vs setvbuf mentioned here:

github.com/JuliaLang/julia

STDOUT/STDERR updates rate

opened 06:12AM - 10 Sep 15 UTC

alyst

I/O

When 0.4-dev Julia script is submitted to a job manager (SGE), and STDERR/STDLOG… is redirected to a file, this log file is updated very infrequently, approx. once in 3 hours. Is there a way to increase the updates rate? The script looks like this: ``` bash #!/bin/sh #$ -pe openmp 16 #$ -R y #$ -l h_rt=11:00:00 #$ -l h_vmem=4G #$ -S /bin/bash #$ -m ae #$ -j y #$ -o <...>/$JOB_NAME_$JOB_ID.log <...> export NWORKERS=$(($NSLOTS-1)) julia --color=no -q -p $NWORKERS <...>/run.jl <...> ```

Since C has setbuf and setvbuf I suppose it IS considered valuable to have this in the hands of the programmer, but maybe only since you don’t have -u for an arbitrary C compiled program. Possibly stdbuf came later and made this redundant? I don’t know, but am curious, does Python have an API (not just similar hack by calling libc) from within the program to do -u, rather than just invoking the program that way? It also seems redundant to have -u there given stdbuf.

What is really the difference with that and my proposal to redefine printl[ln]? Then it is in the hands of the programmer…

[I do know the answer to the question.] I was just proposing it as a hack, a workaround that might work for you. If you don’t like, or scared to [re]define function println then you could define my_println, or unb_println…

What should work but doesn’t is:

stdbuf -o0 -e0 <cmd>

I don’t know if that’s on Linux/Unix only. The reason it doesn’t work is I think that Julia buffers, not just the lower level libc. I think we need to change that so such tools just work.

ryofurue · August 9, 2023, 3:17am

Thank you for the clarification and the detailed information. I now understand what has been going on among julia developers regarding this issue. But what about this?

What’s the reasoning behind using a fully-buffered stream for stderr when your program is run as julia yourprog.jl ? I’ve never seen a language that shows this behavior. Would it be hard/impossible to change that in julia?

If stdout were line-buffered and stderr were unbuffered, then my problem wouldn’t exist in the first place.

Palli · August 9, 2023, 2:16pm

In case the stdbuf Linux command is too obscure or doesn’t exist on e.g. Windows I think we might need this option (or the other way, see alternative env var below):

JULIAUNBUFFERED=true julia my_prog.jl

similar too:

That possibly misleading doc reads to me as if stderr is (fully) buffered by default in Python, but can be made unbuffered.

I don’t think we need -u, i.e. both possible ways to ask for unbuffered, and actively don’t want that option, since I want compiled Julia programs to work too, and then -u might mean something to that program. [Am I wrong about -u? Because while they are synonyms for one program, they are not always, and env var works for you (Julia) subprocesses too, so might be useful by default? -u wouldn’t do that (unless explicitly designed that way, and seemingly can’t, you could run e.g. a bash or python subprocess and it runs Julia in turn). It’s even plausible we would want to respect PYTHONBUFFERED too… just then a possible conflict between the two, and which rules?]

I think it could be because if you log a lot of errors then if redirected to syslog then you would hammer your disk and slow down if unbuffered, or even if line-buffered? Then you have lots of problems… I’m not sure we should optimize for that…, it seems printing to stderr should be infrequent anyway. I’m just trying to find a reason, I’m not sure if it’s intentional, maybe it isn’t and line-buffered should be it, and thought to be so already.

Would you like UNbuffered rather as the default, even since slower, if you could override it (and would that only need to be done as below, or possible from within the program?)? It’s a simpler mental model for new programmers, and I also think it can be made fast by default (my idea of a timer flushing say every 0.1 sec.). If there are issues with my timer idea, then you could at least do it this way: JULIA_BUFFERED=true julia my_prog.jl

ryofurue · August 12, 2023, 5:03pm

By the way,

I may be misunderstanding you, but if you redefine println to always flush, then println would become inefficient: That is, this

for i in 1:largenumber
   println(fully_bufferred_stream, something(i) )
end

would also be slower. Which isn’t good. What should be done is

for i in 1:largenumber
  println(stderr, something(i)) # slow because stderr is unbuffered
  println(fully_buffered_stream, something(i)) # fast
end

This is the reason why we should change the streams, not the printing functions.

Neither. We need fast streams and slow streams. We need to be able to say

stderr outputs right away, but it’s inefficient for this reason, mind you!
If you open a new stream, it’s fully buffered by default. To send the output to the destination immediately, you need to say flush(yourstream) after printing.

Palli · August 17, 2023, 2:52pm

We don’t really need slow streams only buffered, in case that’s the only case to make fast (but I think fast and unbuffered is possible). Would you like these defaults:

stdout is buffered to make it fast (if possible in some later implementation, fast and unbuffered, i.e. the timer idea).
stderr is unbuffered (or maybe line-buffered, seems enough; and I also think printing to stderr should implicitly flush stdout, to not get stuff out of order). I just don’t think it’s too useful to have stderr fully buffered (am I wrong?). I’m not sure what is done currently for it, or by default in Python.
All other streams, i.e. for files, should be fully buffered (I believe that’s the status quo). This should NOT be override-able by a user, e.g. from CLI/ENV var.

Then you could do:

JULIA_UNBUFFERED=true julia my_prog.jl

then 1. stdout (and still 2. stderr) will be fully unbuffered. Or:

JULIA_UNBUFFERED=false julia my_prog.jl

then 2. stderr (and 1. stdout still) will be unbuffered.

Flushing to files will always be explicit. I don’t think auto-flushing for it will be better. You always write pages, and if the buffer is one page then well you do it, you might want to enlarge the buffer, but it seems actively worse to make it smaller. If you worry about data-loss you should of course flush (or want to limit how much you lose), and what you should aim for is:

ryofurue · August 17, 2023, 3:04pm

If possible, we should follow the Linux tradition: stdout is line-buffered and stderr is unbuffered.

I don’t see why you want to deviate from that.

Palli · August 17, 2023, 4:23pm

I only meant buffered for stdout, as opposed to unbuffered. I wasn’t up-to-speed on it meaning line-buffered in Linux tradition, if that’s for sure true. I suppose there’a s good reason for this (we should likely do it, and probably already do):

# Perl's STDOUT is line-buffered when connected to a terminal.
perl -e'print "a\n"; sleep(2); print "b\n";'

# Perl's STDOUT is fully buffered when connected to a pipe.
perl -e'print "a\n"; sleep(2); print "b\n";' | cat

# unbuffer uses pseudo-ttys to fool a program into thinking it's connected to a terminal.
unbuffer perl -e'print "a\n"; sleep(2); print "b\n";' | cat

Given that as a default and otherwise as I explained, do you think ok defaults?

I did not know of this yet another command/option unbuffered… Nor its pros and cons with stdbuf. Is it synonym (more or less) with) stdbuf -o0?

unbuffer is simple with no options, but unlike stdbuf, on my Linux [Mint] it needs to be installed with sudo apt install expect.

I could run:

$ unbuffer julia -e 'println("Hello world!")'

while just:

$ unbuffer julia

interfered with the REPL, unlike:

$ stdbuf -o0 julia

I would like those standard tools to work and/or a Julia ENV var, these tools in part to not have it as pressing to implement the ENV option…

I noticed for man stdbuf:

BUGS
On GLIBC platforms, specifying a buffer size, i.e., using fully buffered mode will result in undefined operation.

No such notice for man unbuffer (though still applying?), however the rather humorous and I assume true:

BUGS
The man page is longer than the program.

ryofurue · August 23, 2023, 8:38am

I’m puzzled by your stdbuf use. The command

stdbuf -o0 -e0 julia myprog.jl > log.txt 2>& err.txt &

doesn’t change the buffering behavior of stdout on my platform (macOS). I think that’s because julia uses file descriptors 13–16 for stderr and stdout whereas stdbuf changes the buffering of file descriptors 0 and 1 . . . but I may be totally mistaken.

If I understand what you are wondering, here is a documentation:

https://eklitzke.org/stdout-buffering

That’s libc’s behavior on Linux. That’s what non-Julia programmers generally expect.

If you run a Fortran program with Intel Fortran on macOS, you’ll see that write(ERROR_UNIT,*) is unbuffered (or line-buffered). Probably Intel Fortran either just uses libc or mimics libc’s behavior.

Topic		Replies	Views
Stderr not flushed right away? General Usage	6	1579	September 27, 2022
Stdout buffering General Usage documentation	2	516	March 5, 2024
Julia slower than Python to sort and reverse a list of integers Performance	40	2585	April 28, 2023
Issues with println buffering output when redirecting stdout to a txt file General Usage	14	3852	July 23, 2019
Disable IOStream buffering or replace by unbuffered IO New to Julia	1	1039	April 7, 2020

Set flushing mode for output stream

Related topics