Collecting all output from shell commands

jw3126 · September 28, 2018, 7:35am

I want to run a command that may or may not exit gracefully. In any case I want to get all of the following information from it:

content of stdout as String
content of stderr as String
exit code as Int

What is the recommended way to do this? (Currently my hacky way of doing this is redirecting to a file and reading that file).

swt30 · September 28, 2018, 8:57am

I did this recently… let me find the code…

"Run a Cmd object, returning the stdout & stderr contents plus the exit code"
function execute(cmd::Cmd)
  out = Pipe()
  err = Pipe()

  process = run(pipeline(ignorestatus(cmd), stdout=out, stderr=err))
  close(out.in)
  close(err.in)

  (
    stdout = String(read(out)), 
    stderr = String(read(err)),  
    code = process.exitcode
  )
end

execute(`ls`)
execute(`ls --invalid-option`)

jw3126 · September 28, 2018, 9:11am

Cool, thanks! I like your solution, but I will leave the question a bit open to see if there are alernatives.

jameson · September 28, 2018, 3:52pm

To avoid the deadlock situation inherent in doing IO operations sequentially, this should be written as:

stdout = @async String(read(out))
stderr = @async String(read(err))
return (
    stdout = wait(stdout),
    stderr = wait(stderr),
    code = process.exitcode
)

jw3126 · October 16, 2018, 6:59am

Let me add another thing to the challenge. I still want to run a command and get the stdout, stderr, exitcode back. However this time, I also want to write from julia to the stdin of the command.

musm · October 16, 2018, 7:46am

With this suggestion

function execute(cmd::Cmd)
  out = Pipe()
  err = Pipe()

  process = run(pipeline(ignorestatus(cmd), stdout=out, stderr=err))
  close(out.in)
  close(err.in)
  stdout = @async String(read(out))
  stderr = @async String(read(err))
  (
    stdout = String(read(out)), 
    stderr = String(read(err)),  
    code = process.exitcode
  )
end

julia> a, = execute(`cmd /c dir C:`); println(a)
julia> a, = execute(`cmd /c dir C:`); println(a)
julia> a, = execute(`cmd /c dir C:`); println(a)
(repeatdly enter the above by executing each line and then hitting the up arrow and executing again)

What you will see is that sometimes it prints the information , but also occasionally prints nothing. Is there a way to avoid this situation.

tkf · October 16, 2018, 7:53am

It looks like this works.

function communicate(cmd::Cmd, input)
    inp = Pipe()
    out = Pipe()
    err = Pipe()

    process = run(pipeline(cmd, stdin=inp, stdout=out, stderr=err), wait=false)
    close(out.in)
    close(err.in)

    stdout = @async String(read(out))
    stderr = @async String(read(err))
    write(process, input)
    close(inp)
    wait(process)
    return (
        stdout = fetch(stdout),
        stderr = fetch(stderr),
        code = process.exitcode
    )
end

@show communicate(`cat`, "hello")
@show communicate(`sh -c "cat; echo errrr 1>&2; exit 3"`, "hello")

jw3126 · October 16, 2018, 8:23am

Cool it works! It looks a bit weird, since inp is not really used.

tkf · October 16, 2018, 8:38am

You can write write(inp, input) instead of write(process, input)

Well, actually I don’t fully understand it. I needed to put Pipe there to pass wait=false as otherwise run sets stdin to devnull. Maybe you can use open(pipeline(cmd, stderr=err), "r+") or sth to reduce explicit Pipe? I haven’t tried that route.

jameson · October 20, 2018, 6:09pm

@musm can you open an issue? That observered behavior doesn’t sound right.

Separately, I’m working on some changes to make this more convenient—stay tuned

Nosferican · April 10, 2020, 7:40pm

Any updates?

Nosferican · April 13, 2020, 10:51am

https://docs.julialang.org/en/v1/manual/running-external-programs/#Running-External-Programs-1
Seems read should be able to work for those cases now.

racinmat · May 14, 2021, 8:53am

I discovered that for some larger outputs the run(pipeline(ignorestatus(cmd), stdout=out, stderr=err)) hangs forever, while run(cmd) finishes quite fast, so there seems to be some problem in Julia 1.5.4 with larger data apparently.

aplavin · May 14, 2021, 3:31pm

Yes, it’s nontrivial to read long output indeed, and this is not specific to 1.5.4. See my question here: How to read long output from external command?. There was no answer, so I still don’t know how to do it properly.

mbaz · May 14, 2021, 4:13pm

I routinely read multimegabyte amounts of data produced by an external process. The way I do it is similar to that described above: read in an async task and concatenate the output until the external process is done.

jameson · May 20, 2021, 3:55pm

Oh yikes, that code seems awful. Using readavailable is nearly always a bad sign.

The code samples above seem to have converged to a proper solution (e.g. Collecting all output from shell commands - #7 by tkf)

In response to my previous comment about making this more convenient: You can now pass a Base.BufferStream object as the pipeline, which will manage setting up the async reader in the background for you. However, this also disables flow control (unless you also hack the BufferStream object to have a maximum buffer size), so do this at your own risk. But for most use cases, this is not an issue.

You can also use the open(cmd, devnull, out, err) do; end form instead of run(pipeline(cmd, stdout=out, stderr=err)), which is shorter, and should soon support detection and reporting of this particular deadlock risk, once https://github.com/JuliaLang/julia/pull/39544 is merged.

mbaz · May 20, 2021, 5:02pm

Sure it’s awful. I’ve spent nine years frustrated by that code.

The solution above applies only to processes whose entire output is read in one shot. IOW, communicate() does not return until cmd ends.

In my case, I need interact with a long-lived process, continuously and asynchronously. I send it a command via stdin and it replies via stdout. The process may crash, may send a partial response, may take a long time to respond, etc.

That awful code is the best I’ve been able to cobble together (and it works, so far). I’ve asked for help with this problem many times before; if you have suggestions, I’d love to hear them. I already have an issue to remove readavailable(). I also want to get rid of Base.throwto() but I’m not sure how to handle timeouts.

jameson · May 20, 2021, 6:19pm

I don’t know anything about gnuplot. The examples in their docs don’t show stopping at any point to read stdout, so they aren’t particularly useful. Is there anything it prints except for echoing the input then writing out “gnuplot>”? Perhaps you want to do read(out, byteswritten); readuntil(out, "\ngnuplot>")? I think the latter was relatively new to when you started Gaston.jl.

giordano · May 20, 2021, 6:28pm

To answer the original post, there is the OutputCollectors.jl package which does all that was asked:

julia> using OutputCollectors

julia> script = """
       #!/bin/sh
       echo 1
       sleep 1
       echo 2 >&2
       sleep 1
       echo 3
       sleep 1
       echo 4
       false
       """
"#!/bin/sh\necho 1\nsleep 1\necho 2 >&2\nsleep 1\necho 3\nsleep 1\necho 4\nfalse\n"

julia> oc = OutputCollector(`sh -c $script`; verbose = true);

julia> [19:28:26] 1
[19:28:27] 2
[19:28:28] 3
[19:28:29] 4
julia> 

julia> oc.P.exitcode
1

julia> collect_stdout(oc)
"1\n3\n4\n"

julia> collect_stderr(oc)
"2\n"

mbaz · May 20, 2021, 8:28pm

(Maybe this should be moved to a different thread?)

Gnuplot reads from stdin when invoked as part of a pipe; when doing so, you don’t get the usual gnuplot REPL. This command results in the plot of a sine wave in addition to the console output:

$ echo "plot sin(x); set output '-'; print 'Done'" | gnuplot --persist
Done

(The set output '-' command results in Gnuplot printing to stdout.)

What I do in Gaston is keep the pipe open, so that I can send subsequent commands to the same gnuplot process. In addition, I try to fail gracefully when Gnuplot fails to respond (that is, it fails to print Done after a timeout period).