It does matter because pixel27 reports that Julia is 10% slower than Perl and you report that it needs twice the time. So there must be something special either with your system or your Perl or your data. We cannot improve the performance gap if we do not understand the exact reason.
this should be counter += sizeof(line)
to do the same work as Perlās length
(in Julia, length(line)
is counting the number of codepoints, which is more work than counting bytes).
After this fix, using a file 100m
with 100 million lines (generated by listing all files with find /
, several times), I get the following times on my system (Linux, i5-7200U CPU @ 2.50GHz):
$ ./a.perl < 100m
STDIN read time (7626414140 chars): 11.7074089050293
# With Julia 1.5.3:
$ julia a.jl < 100m
STDIN read time (7626414140 chars): 14.781
# With Julia 1.6rc1:
$ julia a.jl < 100m
STDIN read time (7626414140 chars): 19.342
Looks like thereās a performance regression in 1.6rc1.
Wouldnāt a mmap
ed logfile be a faster solution?
I wondered if the reported difference might be due to different memory-management strategies used by Perl and Julia. Perl uses a reference-counting collector, whereas Julia IIRC uses a non-compacting generational mark-sweep collector. That could mean that Perl might sometimes free allocated memory a bit more quickly (e.g., in the same loop iteration), and thereby use a smaller memory footprint, leading to higher CPU cache utilization.
I tried to run some quick experiments to confirm this, but I canāt actually reproduce the reported problem with Julia 1.6.0-rc1 and Perl 5.30.0 on my 2012-vintage Intel Core i7-3770 Ubuntu 20.04 desktop with 16 GB RAM:
$ perl -E 'for ($i=0; $i<1e9; $i++){say "hello world!"}' | time julia -E 'function f();i=0;for l=eachline(stdin);i+=1;end;i;end;f()'
1000000000
120.47user 6.47system 2:10.02elapsed 97%CPU (0avgtext+0avgdata 214272maxresident)k
0inputs+0outputs (0major+3043701minor)pagefaults 0swaps
$ perl -E 'for ($i=0; $i<1e9; $i++){say "hello world!"}' | time perl -E 'while(my $l=<STDIN>){$i++}; say $i'
1000000000
118.33user 2.94system 2:01.30elapsed 99%CPU (0avgtext+0avgdata 5520maxresident)k
0inputs+0outputs (0major+262minor)pagefaults 0swaps
Time performance looked pretty much the same in that setup, within a couple of percent.
My best guess is Julia needs to copy the bytes from the input buffer to the string. If eachline is written in Julia then when it generates strings the bytes need to be copied from the input buffer into the string object. If the string just referenced a āviewā into the original buffer then you could violate the immutable string rule.
In Perl where the programmers (probably) have the code in non-perl they could play tricks with the strings and have them reference the original buffers, so no extra byte copy.
When I re-ran my scripts on a file with random lines from 128 to 1024 characters I see Perl running twice as fast as Julia. At this point I really donāt have the inclination to dig into the internal Perl or Julia code to see if Iām right, so itās just a guess.
Iām am kind of curious what the performance of C++ would be because I think it would be under the same constraints as Julia and have a similar performance to Julia. I may find the motivation to test that today.
Well if eachline()
is bad, the C++'s getline
is horrible.
The Julia (jtest):
#!/usr/bin/julia
using Dates
function from_stdin()
start = now()
counter = 0
for line = eachline(;keep=true)
counter += length(line)
end
total = Dates.value(now() - start)/1000
println("STDIN read time ($counter chars): $(total)");
end
from_stdin()
The C++ (ctest):
#include <iostream>
#include <string>
#include <chrono>
int main() {
int chars = 0;
std::chrono::milliseconds start = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch()
);
std::string line;
for(;std::getline(std::cin, line);) {
chars += line.length();
}
std::chrono::milliseconds finish = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch()
);
double seconds = (double) (finish - start).count() / (double) 1000.0;
std::cout << "STDIN read time (" << chars << " chars): " << seconds << std::endl;
return 0;
}
The results:
$ ./jtest < 1_000_000.txt
STDIN read time (577128362 chars): 0.962
$ ./ctest < 1_000_000.txt
STDIN read time (576128362 chars): 9.689
The character difference is the line endings, getline
doesnāt appear to return the \n.
EDIT: Sorry if my C++ is horrible, I havenāt played in that sandbox in years.
I think simeonschaub was right, if you want to just count the number of lines you should use \n
counter. If you want to apply some fast regex logic without allocations, then you should use something like Automa.jl. I do not know how perl works, but I guess each time Julia iterator produces the next chunk of data in eachline
, it implicitly allocates with String(iochunk)
, so there is an overhead.
I mean you can do fast calculations by using raw bytes, these utilities are not so bad in Julia
function myreadline(io)
buf = Vector{UInt8}(undef, 1000)
counter = 0
while !eof(io)
nb = readbytes!(io, buf)
for i in 1:nb
if buf[i] == UInt8('\n')
counter += 1
end
# Here you can do something like `String(buf[prev_i:I])` and do regexp magic
end
end
println("Julia: Number of lines: $counter")
end
myreadline(stdin)
On my laptop timings are the following
sh> cat 1m.txt | time ./test.pl
Perl: Number of lines: 100000000
./test.pl 7.84s user 0.34s system 99% cpu 8.240 total
sh> cat 1m.txt | time julia --startup=no test.jl
Julia: Number of lines: 100000000
julia --startup=no xxx.jl 1.65s user 0.64s system 129% cpu 1.770 total
sh> cat 1m.txt | time wc -l
100000000
wc -l 0.93s user 0.21s system 99% cpu 1.137 total
Here 0.6s is a startup time of julia itself, so overhead is minimal, compared to wc -l
.
So, itās a tradeoff between convenience of work and performance.
Also, Unicode utilities are rather slow, so if you can, itās better to work with Vector{Char}
(or maybe Vector{UInt32}
.
The Julia version runs faster when this line:
counter += length(line)
is replaced with:
counter += sizeof(line)
Thatās why I initially just used a simple counter, so that it wouldnāt matter.
Next thing - invocation.
I learned that piping to STDIN is not equivalent to redirecting a file to STDIN.
Sorry, I was not aware of that.
$ cat test.log | ./testloopspeed2.pl
STDIN read time (4437005389 chars): 28.7752969264984
$ ./testloopspeed2.pl < test.log
STDIN read time (4437005389 chars): 9.6228129863739
$ cat test.log | ./testloopspeed2.jl
STDIN read time (4437005389 chars): 47.381
$ ./testloopspeed2.jl < test.log
STDIN read time (4437005389 chars): 8.878
Julia is slightly faster then Perl, when the file is passed as redirection to STDIN.
Julia is significantly slower than Perl, when the file is passed through a pipe to STDIN.
Edit:
I would like to add that piping to STDIN still has good use cases.
Example:
( zcat archive[0-9].log.gz ; cat current.log ) | ./script.jl
This is useful in rapid scripting, even if itās performing slower.
Whilst trying to search for why the pipe case would be much slower, I found this discourse topic which lead to this GH issue and ultimately this (failed) PR. Perhaps itās worth rebooting given stdin/stdout buffering has come up a a number of times in the past year.
It looks indeed like a buffer issue.
If we consider the following implementation
import Base: iterate
struct Eachline{T <: IO}
io::T
buf::Vector{UInt8}
end
Eachline(io::IO, n::Integer) = Eachline(io, Vector{UInt8}(undef, n))
Eachline(io::IO) = Eachline(io, 10000)
function iterate(iter::Eachline, state = (1, 0))
iprev, nb = state
(eof(iter.io) & (iprev > nb)) && return nothing
s = ""
while true
if iprev > nb
nb = readbytes!(iter.io, iter.buf)
iprev = 1
end
for i in iprev:nb
if iter.buf[i] == UInt8('\n')
if (iprev == 1) & !isempty(s)
s *= String(iter.buf[1:i])
else
s = String(iter.buf[iprev:i])
end
return (s, (i + 1, nb))
end
end
s *= String(iter.buf[iprev:nb])
eof(iter.io) && return (s, (1, 0))
iprev = nb + 1
end
end
Then with the following char counting implementation
# iterators.jl
function main(io)
counter = 0
for s in Eachline(io)
counter += sizeof(s)
end
println("Julia: Number of chars: $counter")
end
main(stdin)
I get the following times
sh> cat 1m.txt | time julia --startup=no iterators.jl
Julia: Number of chars: 1400000000
julia --startup=no iterators.jl 7.53s user 1.05s system 103% cpu 8.314 total
sh> cat 1m.txt | time ./test2.pl
STDIN read time (1400000000 chars): 9.1769061088562
./test2.pl 8.86s user 0.32s system 99% cpu 9.201 total
(here test2.pl is the same as in pixel27 snippet).
Can anyone compare it on other datasets? Because it looks like in this implementation Julia is faster than Perl, despite enormous string allocations. Of course Automa.jl implementation can be less allocating and as a result much much faster.
Thanks to your code, I got reminded of a relevant method in Julia base:
julia> countlines("file.txt")
1000
If you like to live dangerously, you can get a performance boost by turning off locking:
julia> function iterlines(io)
counter = 0
for line in eachline(io)
counter += 1
end
return counter
end
iterlines (generic function with 1 method)
julia> precompile(iterlines, (IOStream,))
true
julia> open("/tmp/textfile.txt") do io
@time iterlines(io)
end
0.195130 seconds (1.01 M allocations: 107.697 MiB, 12.18% gc time)
1000000
julia> open("/tmp/textfile.txt") do io
io._dolock = false # turn off locking
@time iterlines(io)
end
0.086630 seconds (1.01 M allocations: 107.697 MiB, 26.46% gc time)
1000000
If you turn off locking, donāt even think about threading.
What kind of risk would you expect when reading from STDIN?
I would like to mention that I am working with logfiles, where a server process is holding a write filehandle on it. But I donāt see a conflict there.
Hello @Skoffer
Just curious, did you try to write a more efficient version of eachline()
with your code?
I used your code in the following tests (testloopspeed3.jl
):
$ time (cat test.log | ./testloopspeed2.pl)
STDIN read time (4437005389 chars): 30.2816960811615
real 0m30.326s
user 0m19.187s
sys 0m16.329s
$ time (cat test.log | julia --startup=no testloopspeed3.jl)
Julia: Number of chars: 4437005389
real 0m40.995s
user 0m23.752s
sys 0m17.695s
I think your version is significantly faster than eachline()
, but still 25% slower than Perl in my test.
The flag --startup=no
does not give significant advantage.
The flag just disables any startup files that you have configured yourself, if you have never configured any, then it has no effect.
As @ToucheSir pointed out, there is a similar issue in a related thread: