Apologizes if this has already been answered.
I heavily use command line scripts and combination of piped commands to generate list of numbers that I want to “statistically” analyze by piping them to a julia script.
To be clear and simple, let’s say that I have a file containing numbers and a julia script that calculate the sum of the numbers. I want to call my script in the following way:
cat my_file.dat | julia my_script.jl
and this should display the sum of the numbers.
I’ve seen that I can do that with R
in the following way
cat my_file.dat | R --slave -e 'x <- scan(file="stdin", quiet=TRUE); summary(x)'
but I would love to be able to do it with julia instead.
Thanks in advance for your help.
You can do
s = readlines(stdin)
E.g. if I put that in stdin.jl
together with @show s
, I get
~ dpsanders$ cat test.txt | julia stdin.jl
s = ["Hello there", "3 + 4"]
And for the sum
lines = readlines(stdin)
s = sum(parse.(Int, lines))
println(s)
To conclude the topic, if this can help others, here is the script I’ve written to calculate basic statistics from a standard input (in csv format):
#!julia
# Basic statistics from a piped input (a list of figures in csv format)
# Take input file as column of figures, csv format
using Statistics # call the right package
precision=2 # set precision
lines = readlines(stdin) # reading lines from the standard input
s = split.(lines,",") # split lines from the input (with coma as separator)
# be careful the output is an array of arrays
(m,n) = (length(s),length(s[1])) # get dimensions
res = ones(m,n) # create an array of Floats of the right dimension
# read the input split data and fill the Float array
for j = 1:m
for i = 1:n
res[j,i] = parse(Float64,s[j][i])
end
end
# for each column calculate the basic statistics
for i = 1:size(res,2)
if size(res,2)>=2 # print column identifier for more than one column
print("col $i ")
end
println("[Min:",minimum(res[:,i])," Max:",maximum(res[:,i])," Avg:",round(mean(res[:,i]),digits=precision)," Sum:",sum(res[:,i])," Std:",round(std(res[:,i]),digits=precision)," Med:",round(median(res[:,i]),digits=precision),"]")
end
Save this script in a file called bstats (basic stats), give rights to execute it
Usage is very simple:
cat <mycsvfile>.csv | bstats
Enjoy.
1 Like
You could also pass the filename as an argument to the Julia script:
julia my_stats.jl data.csv
Command-line arguments are stored in the variable ARGS
.
Yes, but most of the time the input file is gathered from a long list of commands (grep, cut, sed, …). Thanks for you initial help.
You can also use the built-in DelimitedFiles
package to parse CSV data directly, without having to do your own splitting and parsing:
$ echo "1, 2, 3, 4" | julia -e "using DelimitedFiles; @show readdlm(stdin, ',')"
readdlm(stdin, ',') = [1.0 2.0 3.0 4.0]
You can also use awk
for this kind of thing if you just want to stay in Unix-land.