Is there a way either 1) to specify the flushing mode of an existing output stream or 2) to create a new stream that has the desired flushing mode? [Edit: the output stream needs to be connected to the “screen” (console) and directable to a text file.]
I’m still suffering from the fact that each time I want to print something immediately, I need to add flush(stdout) after the print statement:
I realized we actually need to have this in the hands of the users, same or similar to -u in Python, rather than the programmers hands. I mean it would still be ok to change the default to unbuffered by default (just slower, would be considered a non-breaking performance regression), but you would still then need a way to opt into buffered (the current default), with something like -b.
I actually thought stderr would already always be unbuffered… My change would actually change all streams, and for “printing” to files, and you might want to control this only for stdio (and stderr?).
Define default buffering for stdout and stderr. On Linux, stdout is line-buffered and stderr is unbuffered (I haven’t verified this, though). I think these are excellent defaults.
Give the programmer the ability of setting the buffering behavior of a new stream.
Give the programmer the ability of changing the buffering mode of any stream.
I’m afraid I don’t think that flushing in println() is the way to go. We expect maximal efficiency when printing into a fully buffered stream, but your redefinition of println would degrade the performance.
This is not possible (I believe currently, from within Julia, with a Julia API, though see below for a hack).
Right. I didn’t mean necessarily that you would change our Julia (or make my PR get accepted). I meant you could redefine in the actual program you are using. Or I think that actually works. I didn’t test it, let’s say if you call a precompiled package that does println. Would it use the old println, or get recompiled (I think the latter).
I have been partially successful using the following hack:
c_stdout = cglobal(:stdout, Libc.FILE)
c_stdout_stream = unsafe_load(c_stdout).ptr
ccall(:setbuf, Cvoid, (Ptr{Cvoid}, Ptr{UInt8}), c_stdout_stream, Ptr{UInt8}(0))
which is essentially me trying to call setbuf(stdout, 0) from Julia. It works for some parts of my code, but I’m still experiencing output not printed in chronological order in the example below
Note setbuf vs setvbuf mentioned here:
Since C has setbuf and setvbuf I suppose it IS considered valuable to have this in the hands of the programmer, but maybe only since you don’t have -u for an arbitrary C compiled program. Possibly stdbuf came later and made this redundant? I don’t know, but am curious, does Python have an API (not just similar hack by calling libc) from within the program to do -u, rather than just invoking the program that way? It also seems redundant to have -u there given stdbuf.
What is really the difference with that and my proposal to redefine printl[ln]? Then it is in the hands of the programmer…
[I do know the answer to the question.] I was just proposing it as a hack, a workaround that might work for you. If you don’t like, or scared to [re]define function println then you could define my_println, or unb_println…
What should work but doesn’t is:
stdbuf -o0 -e0 <cmd>
I don’t know if that’s on Linux/Unix only. The reason it doesn’t work is I think that Julia buffers, not just the lower level libc. I think we need to change that so such tools just work.
Thank you for the clarification and the detailed information. I now understand what has been going on among julia developers regarding this issue. But what about this?
What’s the reasoning behind using a fully-buffered stream for stderr when your program is run as julia yourprog.jl ? I’ve never seen a language that shows this behavior. Would it be hard/impossible to change that in julia?
If stdout were line-buffered and stderr were unbuffered, then my problem wouldn’t exist in the first place.
In case the stdbuf Linux command is too obscure or doesn’t exist on e.g. Windows I think we might need this option (or the other way, see alternative env var below):
JULIAUNBUFFERED=true julia my_prog.jl
similar too:
That possibly misleading doc reads to me as if stderr is (fully) buffered by default in Python, but can be made unbuffered.
I don’t think we need -u, i.e. both possible ways to ask for unbuffered, and actively don’t want that option, since I want compiled Julia programs to work too, and then -u might mean something to that program. [Am I wrong about -u? Because while they are synonyms for one program, they are not always, and env var works for you (Julia) subprocesses too, so might be useful by default? -u wouldn’t do that (unless explicitly designed that way, and seemingly can’t, you could run e.g. a bash or python subprocess and it runs Julia in turn). It’s even plausible we would want to respect PYTHONBUFFERED too… just then a possible conflict between the two, and which rules?]
I think it could be because if you log a lot of errors then if redirected to syslog then you would hammer your disk and slow down if unbuffered, or even if line-buffered? Then you have lots of problems… I’m not sure we should optimize for that…, it seems printing to stderr should be infrequent anyway. I’m just trying to find a reason, I’m not sure if it’s intentional, maybe it isn’t and line-buffered should be it, and thought to be so already.
Would you like UNbuffered rather as the default, even since slower, if you could override it (and would that only need to be done as below, or possible from within the program?)? It’s a simpler mental model for new programmers, and I also think it can be made fast by default (my idea of a timer flushing say every 0.1 sec.). If there are issues with my timer idea, then you could at least do it this way: JULIA_BUFFERED=true julia my_prog.jl
I may be misunderstanding you, but if you redefine println to always flush, then println would become inefficient: That is, this
for i in 1:largenumber
println(fully_bufferred_stream, something(i) )
end
would also be slower. Which isn’t good. What should be done is
for i in 1:largenumber
println(stderr, something(i)) # slow because stderr is unbuffered
println(fully_buffered_stream, something(i)) # fast
end
This is the reason why we should change the streams, not the printing functions.
Neither. We need fast streams and slow streams. We need to be able to say
stderr outputs right away, but it’s inefficient for this reason, mind you!
If you open a new stream, it’s fully buffered by default. To send the output to the destination immediately, you need to say flush(yourstream) after printing.
We don’t really need slow streams only buffered, in case that’s the only case to make fast (but I think fast and unbuffered is possible). Would you like these defaults:
stdout is buffered to make it fast (if possible in some later implementation, fast and unbuffered, i.e. the timer idea).
stderr is unbuffered (or maybe line-buffered, seems enough; and I also think printing to stderr should implicitly flush stdout, to not get stuff out of order). I just don’t think it’s too useful to have stderr fully buffered (am I wrong?). I’m not sure what is done currently for it, or by default in Python.
All other streams, i.e. for files, should be fully buffered (I believe that’s the status quo). This should NOT be override-able by a user, e.g. from CLI/ENV var.
Then you could do:
JULIA_UNBUFFERED=true julia my_prog.jl
then 1. stdout (and still 2. stderr) will be fully unbuffered. Or:
JULIA_UNBUFFERED=false julia my_prog.jl
then 2. stderr (and 1. stdout still) will be unbuffered.
Flushing to files will always be explicit. I don’t think auto-flushing for it will be better. You always write pages, and if the buffer is one page then well you do it, you might want to enlarge the buffer, but it seems actively worse to make it smaller. If you worry about data-loss you should of course flush (or want to limit how much you lose), and what you should aim for is:
I only meant buffered for stdout, as opposed to unbuffered. I wasn’t up-to-speed on it meaning line-buffered in Linux tradition, if that’s for sure true. I suppose there’a s good reason for this (we should likely do it, and probably already do):
# Perl's STDOUT is line-buffered when connected to a terminal.
perl -e'print "a\n"; sleep(2); print "b\n";'
# Perl's STDOUT is fully buffered when connected to a pipe.
perl -e'print "a\n"; sleep(2); print "b\n";' | cat
# unbuffer uses pseudo-ttys to fool a program into thinking it's connected to a terminal.
unbuffer perl -e'print "a\n"; sleep(2); print "b\n";' | cat
Given that as a default and otherwise as I explained, do you think ok defaults?
I did not know of this yet another command/option unbuffered… Nor its pros and cons with stdbuf. Is it synonym (more or less) with) stdbuf -o0?
unbuffer is simple with no options, but unlike stdbuf, on my Linux [Mint] it needs to be installed with sudo apt install expect.
I could run:
$ unbuffer julia -e 'println("Hello world!")'
while just:
$ unbuffer julia
interfered with the REPL, unlike:
$ stdbuf -o0 julia
I would like those standard tools to work and/or a Julia ENV var, these tools in part to not have it as pressing to implement the ENV option…
I noticed for man stdbuf:
BUGS
On GLIBC platforms, specifying a buffer size, i.e., using fully buffered mode will result in undefined operation.
No such notice for man unbuffer (though still applying?), however the rather humorous and I assume true:
stdbuf -o0 -e0 julia myprog.jl > log.txt 2>& err.txt &
doesn’t change the buffering behavior of stdout on my platform (macOS). I think that’s because julia uses file descriptors 13–16 for stderr and stdout whereas stdbuf changes the buffering of file descriptors 0 and 1 . . . but I may be totally mistaken.
If I understand what you are wondering, here is a documentation:
That’s libc’s behavior on Linux. That’s what non-Julia programmers generally expect.
If you run a Fortran program with Intel Fortran on macOS, you’ll see that write(ERROR_UNIT,*) is unbuffered (or line-buffered). Probably Intel Fortran either just uses libc or mimics libc’s behavior.