Hi there,
I am new to Julia, and come to Julia for performance with fewer codes. I have been using R, and encountered a simple task: read from a gziped file and do something to each line. It was slow in R, of course. So I went to Perl first. Here is the code:
perl -e '
open( DATA, "zcat ~/in2/swxx/sj/db/ncbi/gene/gene2accession.gz|");
while( <DATA>) {
@F = split( /\t/);
}
close( DATA);'
I found the gzip command was using about 48% CPU of a thread. Then I came to Julia, and here is the code:
function fun1( file1)
for line1 in eachline( `zcat $( expanduser( file1))`)
F = split( line1, '\t')
end
end
fun1( "~/in2/swxx/sj/db/ncbi/gene/gene2accession.gz")
And I fount that the gzip command was also using about 48% CPU. Then I came to Python, and here is the code:
import subprocess
with subprocess.Popen( "zcat ~/in2/swxx/sj/db/ncbi/gene/gene2accession.gz", shell = True, stdout = subprocess.PIPE) as gz:
for line1 in gz.stdout:
F = line1.split( b'\t')
Now, the gzip is using nearly 100% CPU, and so is Python.
I have not recoreded times spent by these 3 approaches. But I can feel that the python code is obviously faster than the other two.
How come?