I wanted for example to do something similar to this Python script.
script.py
import sys
for line in sys.stdin:
line = line.upper()
cur_name, cur_symb, _, value = line.split()
cur_symb = cur_symb.strip('()')
print(f"{cur_name:>10} {value:>3} {cur_symb}")
#print(line, end='')
with data.txt
bitcoin (btc) : 5
euros (€) : 100
dollars ($) : 80
which can be called using:
cat data.txt | python script.py
and output:
BITCOIN 5 BTC
EUROS 100 €
DOLLARS 80 $
and finally did this Julia script:
script.jl
for line in readlines(stdin)
line = uppercase(line)
cur_name, cur_symb, _, value = split(line)
cur_symb = strip(cur_symb, ['(', ')'])
println("$(lpad(cur_name, 10)) $(lpad(value, 3)) $(cur_symb)")
end
which can be called using
cat data.txt | julia script.jl
Maybe we should write a kind of tutorial about data cleaning with Julia (taking inspiration from the large number of tutorials about data cleaning teaching how to use iconv, head, tail, tr, wc, split, sort, uniq, cut, paste, join, grep, sed, awk, … and show that we can do all these tasks with ONE tool: Julia (I know it’s not the Unix philosophy)
We should also probably make a command line tool for that purpose which could be used to write oneliners like
cat data.txt | jsed "println(uppercase(line))"
or more complex process on stream of lines.
Commands could be given like
cat data.txt | jsed "line = uppercase(line)" "cur_name, cur_symb, _, value = split(line)" "cur_symb = strip(cur_symb, ['(', ')'])" "println(\"$(lpad(cur_name, 10)) $(lpad(value, 3)) $(cur_symb)\")"
or using multiline
cat data.txt | jsed """
line = uppercase(line)
cur_name, cur_symb, _, value = split(line)
cur_symb = strip(cur_symb, ['(', ')'])
println("$(lpad(cur_name, 10)) $(lpad(value, 3)) $(cur_symb)")
"""
A “specialised” Julia template could be set using a parameter passed jsed but in most case it shouldn’t be necessary.
What is your opinion about such a tool?
PS: maybe we should have something like
for (count, line) in enumerate(readlines(stdin))
line = uppercase(line)
cur_name, cur_symb, _, value = split(line)
cur_symb = strip(cur_symb, ['(', ')'])
println("$(count) $(lpad(cur_name, 10)) $(lpad(value, 3)) $(cur_symb)")
end
so we can process differently headers than data / skip headers using count so template will be
for (count, line) in enumerate(readlines(stdin))
...
end