How to capture standard input from a piped command?

Marc_Sevaux · May 15, 2020, 6:30am

Apologizes if this has already been answered.
I heavily use command line scripts and combination of piped commands to generate list of numbers that I want to “statistically” analyze by piping them to a julia script.
To be clear and simple, let’s say that I have a file containing numbers and a julia script that calculate the sum of the numbers. I want to call my script in the following way:

cat my_file.dat | julia my_script.jl

and this should display the sum of the numbers.

I’ve seen that I can do that with R in the following way

 cat my_file.dat | R --slave -e 'x <- scan(file="stdin", quiet=TRUE); summary(x)'

but I would love to be able to do it with julia instead.
Thanks in advance for your help.

dpsanders · May 15, 2020, 6:35am

You can do

s = readlines(stdin)

E.g. if I put that in stdin.jl together with @show s, I get

~ dpsanders$ cat test.txt | julia stdin.jl
s = ["Hello there", "3 + 4"]

dpsanders · May 15, 2020, 6:38am

And for the sum

lines = readlines(stdin)
s = sum(parse.(Int, lines))
println(s)

Marc_Sevaux · May 15, 2020, 6:40am

Great, so fast

Marc_Sevaux · May 18, 2020, 11:46am

To conclude the topic, if this can help others, here is the script I’ve written to calculate basic statistics from a standard input (in csv format):

#!julia
# Basic statistics from a piped input (a list of figures in csv format)
# Take input file as column of figures, csv format
using Statistics # call the right package

precision=2 # set precision
lines = readlines(stdin) # reading lines from the standard input
s = split.(lines,",") # split lines from the input (with coma as separator)
# be careful the output is an array of arrays
(m,n) = (length(s),length(s[1])) # get dimensions
res = ones(m,n) # create an array of Floats of the right dimension
# read the input split data and fill the Float array
for j = 1:m
  for i = 1:n
    res[j,i] = parse(Float64,s[j][i])
  end
end
# for each column calculate the basic statistics
for i = 1:size(res,2)
  if size(res,2)>=2 # print column identifier for more than one column
    print("col $i ")
  end
  println("[Min:",minimum(res[:,i])," Max:",maximum(res[:,i])," Avg:",round(mean(res[:,i]),digits=precision)," Sum:",sum(res[:,i])," Std:",round(std(res[:,i]),digits=precision)," Med:",round(median(res[:,i]),digits=precision),"]")
end

Save this script in a file called bstats (basic stats), give rights to execute it
Usage is very simple:

cat <mycsvfile>.csv | bstats

Enjoy.

dpsanders · May 18, 2020, 3:45pm

You could also pass the filename as an argument to the Julia script:

julia my_stats.jl data.csv

Command-line arguments are stored in the variable ARGS.

Marc_Sevaux · May 18, 2020, 3:53pm

Yes, but most of the time the input file is gathered from a long list of commands (grep, cut, sed, …). Thanks for you initial help.

rdeits · May 18, 2020, 4:04pm

You can also use the built-in DelimitedFiles package to parse CSV data directly, without having to do your own splitting and parsing:

$ echo "1, 2, 3, 4" | julia -e "using DelimitedFiles; @show readdlm(stdin, ',')"
readdlm(stdin, ',') = [1.0 2.0 3.0 4.0]

dpsanders · May 18, 2020, 10:14pm

You can also use awk for this kind of thing if you just want to stay in Unix-land.