Bad performance of eachline() on STDIN

ufechner7 · February 21, 2021, 9:51am

It does matter because pixel27 reports that Julia is 10% slower than Perl and you report that it needs twice the time. So there must be something special either with your system or your Perl or your data. We cannot improve the performance gap if we do not understand the exact reason.

sijo · February 21, 2021, 10:33am

this should be counter += sizeof(line) to do the same work as Perl’s length (in Julia, length(line) is counting the number of codepoints, which is more work than counting bytes).

After this fix, using a file 100m with 100 million lines (generated by listing all files with find /, several times), I get the following times on my system (Linux, i5-7200U CPU @ 2.50GHz):

$ ./a.perl < 100m
STDIN read time (7626414140 chars): 11.7074089050293

# With Julia 1.5.3:
$ julia a.jl < 100m
STDIN read time (7626414140 chars): 14.781

# With Julia 1.6rc1:
$ julia a.jl < 100m
STDIN read time (7626414140 chars): 19.342

Looks like there’s a performance regression in 1.6rc1.

laborg · February 21, 2021, 11:03am

Wouldn’t a mmaped logfile be a faster solution?

mgkuhn · February 21, 2021, 11:36am

I wondered if the reported difference might be due to different memory-management strategies used by Perl and Julia. Perl uses a reference-counting collector, whereas Julia IIRC uses a non-compacting generational mark-sweep collector. That could mean that Perl might sometimes free allocated memory a bit more quickly (e.g., in the same loop iteration), and thereby use a smaller memory footprint, leading to higher CPU cache utilization.

I tried to run some quick experiments to confirm this, but I can’t actually reproduce the reported problem with Julia 1.6.0-rc1 and Perl 5.30.0 on my 2012-vintage Intel Core i7-3770 Ubuntu 20.04 desktop with 16 GB RAM:

$ perl -E 'for ($i=0; $i<1e9; $i++){say "hello world!"}' | time julia -E 'function f();i=0;for l=eachline(stdin);i+=1;end;i;end;f()'
1000000000
120.47user 6.47system 2:10.02elapsed 97%CPU (0avgtext+0avgdata 214272maxresident)k
0inputs+0outputs (0major+3043701minor)pagefaults 0swaps
$ perl -E 'for ($i=0; $i<1e9; $i++){say "hello world!"}' | time perl -E 'while(my $l=<STDIN>){$i++}; say $i'
1000000000
118.33user 2.94system 2:01.30elapsed 99%CPU (0avgtext+0avgdata 5520maxresident)k
0inputs+0outputs (0major+262minor)pagefaults 0swaps

Time performance looked pretty much the same in that setup, within a couple of percent.

pixel27 · February 21, 2021, 12:44pm

My best guess is Julia needs to copy the bytes from the input buffer to the string. If eachline is written in Julia then when it generates strings the bytes need to be copied from the input buffer into the string object. If the string just referenced a “view” into the original buffer then you could violate the immutable string rule.

In Perl where the programmers (probably) have the code in non-perl they could play tricks with the strings and have them reference the original buffers, so no extra byte copy.

When I re-ran my scripts on a file with random lines from 128 to 1024 characters I see Perl running twice as fast as Julia. At this point I really don’t have the inclination to dig into the internal Perl or Julia code to see if I’m right, so it’s just a guess.

I’m am kind of curious what the performance of C++ would be because I think it would be under the same constraints as Julia and have a similar performance to Julia. I may find the motivation to test that today.

pixel27 · February 21, 2021, 1:55pm

Well if eachline() is bad, the C++'s getline is horrible.

The Julia (jtest):

#!/usr/bin/julia

using Dates

function from_stdin()
    start = now()
    counter = 0
    for line = eachline(;keep=true)
        counter += length(line)
    end
    total = Dates.value(now() - start)/1000
    println("STDIN read time ($counter chars): $(total)");
end
from_stdin()

The C++ (ctest):

#include <iostream>
#include <string>
#include <chrono>

int main() {
    int chars = 0;
    std::chrono::milliseconds start = std::chrono::duration_cast<std::chrono::milliseconds>(
        std::chrono::system_clock::now().time_since_epoch()
    );

    std::string line;

    for(;std::getline(std::cin, line);) {
        chars += line.length();
    }
    std::chrono::milliseconds finish = std::chrono::duration_cast<std::chrono::milliseconds>(
        std::chrono::system_clock::now().time_since_epoch()
    );

    double seconds = (double) (finish - start).count() / (double) 1000.0;
    std::cout << "STDIN read time (" << chars << " chars): " << seconds << std::endl;
    return 0;
}

The results:

$ ./jtest < 1_000_000.txt 
STDIN read time (577128362 chars): 0.962

$ ./ctest < 1_000_000.txt 
STDIN read time (576128362 chars): 9.689

The character difference is the line endings, getline doesn’t appear to return the \n.

EDIT: Sorry if my C++ is horrible, I haven’t played in that sandbox in years.

Skoffer · February 21, 2021, 2:52pm

I think simeonschaub was right, if you want to just count the number of lines you should use \n counter. If you want to apply some fast regex logic without allocations, then you should use something like Automa.jl. I do not know how perl works, but I guess each time Julia iterator produces the next chunk of data in eachline, it implicitly allocates with String(iochunk), so there is an overhead.

I mean you can do fast calculations by using raw bytes, these utilities are not so bad in Julia

function myreadline(io)
    buf = Vector{UInt8}(undef, 1000)
    counter = 0
    while !eof(io)
        nb = readbytes!(io, buf)
        for i in 1:nb
            if buf[i] == UInt8('\n')
                counter += 1
            end
            # Here you can do something like `String(buf[prev_i:I])` and do regexp magic
        end
    end
    println("Julia: Number of lines: $counter")
end

myreadline(stdin)

On my laptop timings are the following

sh> cat 1m.txt | time ./test.pl
Perl:  Number of lines: 100000000
./test.pl  7.84s user 0.34s system 99% cpu 8.240 total

sh> cat 1m.txt | time julia --startup=no test.jl
Julia: Number of lines: 100000000
julia --startup=no xxx.jl  1.65s user 0.64s system 129% cpu 1.770 total

sh> cat 1m.txt | time wc -l
100000000
wc -l  0.93s user 0.21s system 99% cpu 1.137 total

Here 0.6s is a startup time of julia itself, so overhead is minimal, compared to wc -l.

So, it’s a tradeoff between convenience of work and performance.
Also, Unicode utilities are rather slow, so if you can, it’s better to work with Vector{Char}(or maybe Vector{UInt32}.

tferic · February 21, 2021, 4:10pm

pixel27:

For me on a 100 million line file I get:
$ ./ptest < 100_000_000.txt 
STDIN read time (1388888898 chars): 11.8912749290466

$ julia test.jl < 100_000_000.txt 
STDIN read time (1388888898 lines): 12.91
So a difference of 1 second over a runtime of 12 seconds. Perl still runs faster on this small test, no clue where the overhead is. I suspect Julia might be copying bytes out of the buffer when it converts them to a string, and Perl might not be, but that’s just a wild guess.

The Julia version runs faster when this line:

   counter += length(line)

is replaced with:

   counter += sizeof(line)

That’s why I initially just used a simple counter, so that it wouldn’t matter.

Next thing - invocation.
I learned that piping to STDIN is not equivalent to redirecting a file to STDIN.
Sorry, I was not aware of that.

$ cat test.log | ./testloopspeed2.pl
STDIN read time (4437005389 chars): 28.7752969264984

$ ./testloopspeed2.pl < test.log
STDIN read time (4437005389 chars): 9.6228129863739

$ cat test.log | ./testloopspeed2.jl
STDIN read time (4437005389 chars): 47.381

$ ./testloopspeed2.jl < test.log
STDIN read time (4437005389 chars): 8.878

Julia is slightly faster then Perl, when the file is passed as redirection to STDIN.
Julia is significantly slower than Perl, when the file is passed through a pipe to STDIN.

Edit:
I would like to add that piping to STDIN still has good use cases.
Example:

( zcat archive[0-9].log.gz ; cat current.log ) | ./script.jl

This is useful in rapid scripting, even if it’s performing slower.

ToucheSir · February 21, 2021, 4:33pm

Whilst trying to search for why the pipe case would be much slower, I found this discourse topic which lead to this GH issue and ultimately this (failed) PR. Perhaps it’s worth rebooting given stdin/stdout buffering has come up a a number of times in the past year.

Skoffer · February 21, 2021, 6:52pm

It looks indeed like a buffer issue.

If we consider the following implementation

import Base: iterate

struct Eachline{T <: IO}
    io::T
    buf::Vector{UInt8}
end

Eachline(io::IO, n::Integer) = Eachline(io, Vector{UInt8}(undef, n))
Eachline(io::IO) = Eachline(io, 10000)

function iterate(iter::Eachline, state = (1, 0))
    iprev, nb = state
    (eof(iter.io) & (iprev > nb)) && return nothing
    s = ""
    while true
        if iprev > nb
            nb = readbytes!(iter.io, iter.buf)
            iprev = 1
        end
        for i in iprev:nb
            if iter.buf[i] == UInt8('\n')
                if (iprev == 1) & !isempty(s)
                    s *= String(iter.buf[1:i])
                else
                    s = String(iter.buf[iprev:i])
                end
                return (s, (i + 1, nb))
            end
        end
        s *= String(iter.buf[iprev:nb])
        eof(iter.io) && return (s, (1, 0))
        iprev = nb + 1
    end
end

Then with the following char counting implementation

# iterators.jl

function main(io)
    counter = 0
    for s in Eachline(io)
        counter += sizeof(s)
    end
    println("Julia: Number of chars: $counter")
end

main(stdin)

I get the following times

sh> cat 1m.txt | time julia --startup=no iterators.jl
Julia: Number of chars: 1400000000
julia --startup=no iterators.jl  7.53s user 1.05s system 103% cpu 8.314 total

sh> cat 1m.txt | time ./test2.pl
STDIN read time (1400000000 chars): 9.1769061088562
./test2.pl  8.86s user 0.32s system 99% cpu 9.201 total

(here test2.pl is the same as in pixel27 snippet).

Can anyone compare it on other datasets? Because it looks like in this implementation Julia is faster than Perl, despite enormous string allocations. Of course Automa.jl implementation can be less allocating and as a result much much faster.

rikh · February 21, 2021, 7:53pm

Thanks to your code, I got reminded of a relevant method in Julia base:

julia> countlines("file.txt")
1000

tim.holy · February 21, 2021, 10:39pm

If you like to live dangerously, you can get a performance boost by turning off locking:

julia> function iterlines(io)
           counter = 0
           for line in eachline(io)
               counter += 1
           end
           return counter
       end
iterlines (generic function with 1 method)

julia> precompile(iterlines, (IOStream,))
true

julia> open("/tmp/textfile.txt") do io
           @time iterlines(io)
       end
  0.195130 seconds (1.01 M allocations: 107.697 MiB, 12.18% gc time)
1000000

julia> open("/tmp/textfile.txt") do io
           io._dolock = false   # turn off locking
           @time iterlines(io)
       end
  0.086630 seconds (1.01 M allocations: 107.697 MiB, 26.46% gc time)
1000000

If you turn off locking, don’t even think about threading.

tferic · February 22, 2021, 10:51am

What kind of risk would you expect when reading from STDIN?

I would like to mention that I am working with logfiles, where a server process is holding a write filehandle on it. But I don’t see a conflict there.

tferic · February 22, 2021, 11:25pm

Hello @Skoffer
Just curious, did you try to write a more efficient version of eachline() with your code?

I used your code in the following tests (testloopspeed3.jl):

$ time (cat test.log | ./testloopspeed2.pl)
STDIN read time (4437005389 chars): 30.2816960811615

real    0m30.326s
user    0m19.187s
sys     0m16.329s

$ time (cat test.log | julia --startup=no testloopspeed3.jl)
Julia: Number of chars: 4437005389

real    0m40.995s
user    0m23.752s
sys     0m17.695s

I think your version is significantly faster than eachline(), but still 25% slower than Perl in my test.
The flag --startup=no does not give significant advantage.

Henrique_Becker · February 23, 2021, 12:13am

The flag just disables any startup files that you have configured yourself, if you have never configured any, then it has no effect.

tferic · March 3, 2021, 8:51am

As @ToucheSir pointed out, there is a similar issue in a related thread:

Topic		Replies	Views
Read large stream from STDIN General Usage	5	1926	February 20, 2021
Slow gzip streaming in julia but not in python General Usage performance	20	1957	March 18, 2021
Speed comparison on reading a gzip file General Usage performance	8	1467	December 10, 2023
Standard streams much slower in Windows than in Linux General Usage	4	1100	June 5, 2019
Julia can process text similar to Perl New to Julia question , strings	12	1943	March 20, 2020

Bad performance of eachline() on STDIN

Related topics