Bad performance of eachline() on STDIN

It does matter because pixel27 reports that Julia is 10% slower than Perl and you report that it needs twice the time. So there must be something special either with your system or your Perl or your data. We cannot improve the performance gap if we do not understand the exact reason.

2 Likes

this should be counter += sizeof(line) to do the same work as Perl’s length (in Julia, length(line) is counting the number of codepoints, which is more work than counting bytes).

After this fix, using a file 100m with 100 million lines (generated by listing all files with find /, several times), I get the following times on my system (Linux, i5-7200U CPU @ 2.50GHz):

$ ./a.perl < 100m
STDIN read time (7626414140 chars): 11.7074089050293

# With Julia 1.5.3:
$ julia a.jl < 100m
STDIN read time (7626414140 chars): 14.781

# With Julia 1.6rc1:
$ julia a.jl < 100m
STDIN read time (7626414140 chars): 19.342

Looks like there’s a performance regression in 1.6rc1.

3 Likes

Wouldn’t a mmaped logfile be a faster solution?

I wondered if the reported difference might be due to different memory-management strategies used by Perl and Julia. Perl uses a reference-counting collector, whereas Julia IIRC uses a non-compacting generational mark-sweep collector. That could mean that Perl might sometimes free allocated memory a bit more quickly (e.g., in the same loop iteration), and thereby use a smaller memory footprint, leading to higher CPU cache utilization.

I tried to run some quick experiments to confirm this, but I can’t actually reproduce the reported problem with Julia 1.6.0-rc1 and Perl 5.30.0 on my 2012-vintage Intel Core i7-3770 Ubuntu 20.04 desktop with 16 GB RAM:

$ perl -E 'for ($i=0; $i<1e9; $i++){say "hello world!"}' | time julia -E 'function f();i=0;for l=eachline(stdin);i+=1;end;i;end;f()'
1000000000
120.47user 6.47system 2:10.02elapsed 97%CPU (0avgtext+0avgdata 214272maxresident)k
0inputs+0outputs (0major+3043701minor)pagefaults 0swaps
$ perl -E 'for ($i=0; $i<1e9; $i++){say "hello world!"}' | time perl -E 'while(my $l=<STDIN>){$i++}; say $i'
1000000000
118.33user 2.94system 2:01.30elapsed 99%CPU (0avgtext+0avgdata 5520maxresident)k
0inputs+0outputs (0major+262minor)pagefaults 0swaps

Time performance looked pretty much the same in that setup, within a couple of percent.

My best guess is Julia needs to copy the bytes from the input buffer to the string. If eachline is written in Julia then when it generates strings the bytes need to be copied from the input buffer into the string object. If the string just referenced a “view” into the original buffer then you could violate the immutable string rule.

In Perl where the programmers (probably) have the code in non-perl they could play tricks with the strings and have them reference the original buffers, so no extra byte copy.

When I re-ran my scripts on a file with random lines from 128 to 1024 characters I see Perl running twice as fast as Julia. At this point I really don’t have the inclination to dig into the internal Perl or Julia code to see if I’m right, so it’s just a guess.

I’m am kind of curious what the performance of C++ would be because I think it would be under the same constraints as Julia and have a similar performance to Julia. I may find the motivation to test that today.

Well if eachline() is bad, the C++'s getline is horrible.

The Julia (jtest):

#!/usr/bin/julia

using Dates

function from_stdin()
    start = now()
    counter = 0
    for line = eachline(;keep=true)
        counter += length(line)
    end
    total = Dates.value(now() - start)/1000
    println("STDIN read time ($counter chars): $(total)");
end
from_stdin()

The C++ (ctest):

#include <iostream>
#include <string>
#include <chrono>

int main() {
    int chars = 0;
    std::chrono::milliseconds start = std::chrono::duration_cast<std::chrono::milliseconds>(
        std::chrono::system_clock::now().time_since_epoch()
    );

    std::string line;

    for(;std::getline(std::cin, line);) {
        chars += line.length();
    }
    std::chrono::milliseconds finish = std::chrono::duration_cast<std::chrono::milliseconds>(
        std::chrono::system_clock::now().time_since_epoch()
    );

    double seconds = (double) (finish - start).count() / (double) 1000.0;
    std::cout << "STDIN read time (" << chars << " chars): " << seconds << std::endl;
    return 0;
}

The results:

$ ./jtest < 1_000_000.txt 
STDIN read time (577128362 chars): 0.962

$ ./ctest < 1_000_000.txt 
STDIN read time (576128362 chars): 9.689

The character difference is the line endings, getline doesn’t appear to return the \n.

EDIT: Sorry if my C++ is horrible, I haven’t played in that sandbox in years.

I think simeonschaub was right, if you want to just count the number of lines you should use \n counter. If you want to apply some fast regex logic without allocations, then you should use something like Automa.jl. I do not know how perl works, but I guess each time Julia iterator produces the next chunk of data in eachline, it implicitly allocates with String(iochunk), so there is an overhead.

I mean you can do fast calculations by using raw bytes, these utilities are not so bad in Julia

function myreadline(io)
    buf = Vector{UInt8}(undef, 1000)
    counter = 0
    while !eof(io)
        nb = readbytes!(io, buf)
        for i in 1:nb
            if buf[i] == UInt8('\n')
                counter += 1
            end
            # Here you can do something like `String(buf[prev_i:I])` and do regexp magic
        end
    end
    println("Julia: Number of lines: $counter")
end

myreadline(stdin)

On my laptop timings are the following

sh> cat 1m.txt | time ./test.pl
Perl:  Number of lines: 100000000
./test.pl  7.84s user 0.34s system 99% cpu 8.240 total

sh> cat 1m.txt | time julia --startup=no test.jl
Julia: Number of lines: 100000000
julia --startup=no xxx.jl  1.65s user 0.64s system 129% cpu 1.770 total

sh> cat 1m.txt | time wc -l
100000000
wc -l  0.93s user 0.21s system 99% cpu 1.137 total

Here 0.6s is a startup time of julia itself, so overhead is minimal, compared to wc -l.

So, it’s a tradeoff between convenience of work and performance.
Also, Unicode utilities are rather slow, so if you can, it’s better to work with Vector{Char}(or maybe Vector{UInt32}.

3 Likes

The Julia version runs faster when this line:

   counter += length(line)

is replaced with:

   counter += sizeof(line)

That’s why I initially just used a simple counter, so that it wouldn’t matter.

Next thing - invocation.
I learned that piping to STDIN is not equivalent to redirecting a file to STDIN.
Sorry, I was not aware of that.

$ cat test.log | ./testloopspeed2.pl
STDIN read time (4437005389 chars): 28.7752969264984

$ ./testloopspeed2.pl < test.log
STDIN read time (4437005389 chars): 9.6228129863739

$ cat test.log | ./testloopspeed2.jl
STDIN read time (4437005389 chars): 47.381

$ ./testloopspeed2.jl < test.log
STDIN read time (4437005389 chars): 8.878

Julia is slightly faster then Perl, when the file is passed as redirection to STDIN.
Julia is significantly slower than Perl, when the file is passed through a pipe to STDIN.

Edit:
I would like to add that piping to STDIN still has good use cases.
Example:

( zcat archive[0-9].log.gz ; cat current.log ) | ./script.jl

This is useful in rapid scripting, even if it’s performing slower.

4 Likes

Whilst trying to search for why the pipe case would be much slower, I found this discourse topic which lead to this GH issue and ultimately this (failed) PR. Perhaps it’s worth rebooting given stdin/stdout buffering has come up a a number of times in the past year.

6 Likes

It looks indeed like a buffer issue.

If we consider the following implementation

import Base: iterate

struct Eachline{T <: IO}
    io::T
    buf::Vector{UInt8}
end

Eachline(io::IO, n::Integer) = Eachline(io, Vector{UInt8}(undef, n))
Eachline(io::IO) = Eachline(io, 10000)

function iterate(iter::Eachline, state = (1, 0))
    iprev, nb = state
    (eof(iter.io) & (iprev > nb)) && return nothing
    s = ""
    while true
        if iprev > nb
            nb = readbytes!(iter.io, iter.buf)
            iprev = 1
        end
        for i in iprev:nb
            if iter.buf[i] == UInt8('\n')
                if (iprev == 1) & !isempty(s)
                    s *= String(iter.buf[1:i])
                else
                    s = String(iter.buf[iprev:i])
                end
                return (s, (i + 1, nb))
            end
        end
        s *= String(iter.buf[iprev:nb])
        eof(iter.io) && return (s, (1, 0))
        iprev = nb + 1
    end
end

Then with the following char counting implementation

# iterators.jl

function main(io)
    counter = 0
    for s in Eachline(io)
        counter += sizeof(s)
    end
    println("Julia: Number of chars: $counter")
end

main(stdin)

I get the following times

sh> cat 1m.txt | time julia --startup=no iterators.jl
Julia: Number of chars: 1400000000
julia --startup=no iterators.jl  7.53s user 1.05s system 103% cpu 8.314 total

sh> cat 1m.txt | time ./test2.pl
STDIN read time (1400000000 chars): 9.1769061088562
./test2.pl  8.86s user 0.32s system 99% cpu 9.201 total

(here test2.pl is the same as in pixel27 snippet).

Can anyone compare it on other datasets? Because it looks like in this implementation Julia is faster than Perl, despite enormous string allocations. Of course Automa.jl implementation can be less allocating and as a result much much faster.

3 Likes

Thanks to your code, I got reminded of a relevant method in Julia base:

julia> countlines("file.txt")
1000
1 Like

If you like to live dangerously, you can get a performance boost by turning off locking:

julia> function iterlines(io)
           counter = 0
           for line in eachline(io)
               counter += 1
           end
           return counter
       end
iterlines (generic function with 1 method)

julia> precompile(iterlines, (IOStream,))
true

julia> open("/tmp/textfile.txt") do io
           @time iterlines(io)
       end
  0.195130 seconds (1.01 M allocations: 107.697 MiB, 12.18% gc time)
1000000

julia> open("/tmp/textfile.txt") do io
           io._dolock = false   # turn off locking
           @time iterlines(io)
       end
  0.086630 seconds (1.01 M allocations: 107.697 MiB, 26.46% gc time)
1000000

If you turn off locking, don’t even think about threading.

5 Likes

What kind of risk would you expect when reading from STDIN?

I would like to mention that I am working with logfiles, where a server process is holding a write filehandle on it. But I don’t see a conflict there.

Hello @Skoffer
Just curious, did you try to write a more efficient version of eachline() with your code?

I used your code in the following tests (testloopspeed3.jl):

$ time (cat test.log | ./testloopspeed2.pl)
STDIN read time (4437005389 chars): 30.2816960811615

real    0m30.326s
user    0m19.187s
sys     0m16.329s

$ time (cat test.log | julia --startup=no testloopspeed3.jl)
Julia: Number of chars: 4437005389

real    0m40.995s
user    0m23.752s
sys     0m17.695s

I think your version is significantly faster than eachline(), but still 25% slower than Perl in my test.
The flag --startup=no does not give significant advantage.

The flag just disables any startup files that you have configured yourself, if you have never configured any, then it has no effect.

2 Likes