Read large stream from STDIN

tferic · February 19, 2021, 4:40pm

Hi
I would like to use Julia to process large logfiles piped into STDIN, in order to create reports.
The amount of data is too big to fit into RAM, so I need to process every line by line while reading the stream of data (rather than slurping all data into a variable before starting to process data).

Like this:

cat my_huge_logfile.log | reporting.jl

In Perl, I would use something like this:

while(<STDIN>) {
    # regex matching on current line
    # do some preprocessing
    # remember selected data in hash/array
}
# reporting based on hash/array

What is the equivalent or better way do it in Julia?

jmert · February 19, 2021, 4:52pm

Something like this?

function main()
    bytes = 0
    while !eof(stdin)
        line = readline(stdin)
        bytes += length(line)
    end
    println(bytes)
end

main()

$ journalctl -b | julia stream.jl 
21571339

tferic · February 19, 2021, 5:16pm

Many thanks @jmert
I have figured out another way. I will try both and see which is faster.

for line = readlines()
    # Work with variable line here
end

jmert · February 19, 2021, 5:23pm

Looking at the code with @edit readlines(stdin), it looks like it’ll load everything into RAM as a vector of lines. But that inspection also leads to eachline(stdin) which I think is more what you were looking for.

eachline() docs

tferic · February 19, 2021, 11:55pm

@jmert Thanks for taking the time. Yes, I can confirm. I tried readlines() on a huge logfile, and it is trying to read everything into memory. That’s not what I want.

The following structure seems to work with large data the way I want:

for line = eachline()
    # Work with variable line here
end

Many thanks for your help!
Toni

tferic · February 20, 2021, 9:38pm

I noticed that there are performance issues with eachline() as well.
For these performance reasons, I have opened a new thread here:

Topic		Replies	Views
Bad performance of eachline() on STDIN General Usage	35	2594	March 3, 2021
Reading a file line by line General Usage question	3	11545	December 3, 2018
Reading (big) ascii files Data	11	2731	April 5, 2019
Passing STDIN into a script as one might a file General Usage	6	1300	August 6, 2018
Eachline function and more than one line New to Julia io	4	149	April 20, 2025

Read large stream from STDIN

Related topics