Read large stream from STDIN

Hi
I would like to use Julia to process large logfiles piped into STDIN, in order to create reports.
The amount of data is too big to fit into RAM, so I need to process every line by line while reading the stream of data (rather than slurping all data into a variable before starting to process data).

Like this:

cat my_huge_logfile.log | reporting.jl

In Perl, I would use something like this:

while(<STDIN>) {
    # regex matching on current line
    # do some preprocessing
    # remember selected data in hash/array
}
# reporting based on hash/array

What is the equivalent or better way do it in Julia?

Something like this?

function main()
    bytes = 0
    while !eof(stdin)
        line = readline(stdin)
        bytes += length(line)
    end
    println(bytes)
end

main()
$ journalctl -b | julia stream.jl 
21571339
3 Likes

Many thanks @jmert
I have figured out another way. I will try both and see which is faster.

for line = readlines()
    # Work with variable line here
end

Looking at the code with @edit readlines(stdin), it looks like it’ll load everything into RAM as a vector of lines. But that inspection also leads to eachline(stdin) which I think is more what you were looking for.

eachline() docs

@jmert Thanks for taking the time. Yes, I can confirm. I tried readlines() on a huge logfile, and it is trying to read everything into memory. That’s not what I want.

The following structure seems to work with large data the way I want:

for line = eachline()
    # Work with variable line here
end

Many thanks for your help!
Toni

I noticed that there are performance issues with eachline() as well.
For these performance reasons, I have opened a new thread here: