perl using shell piping took ~32 seconds.
python using shell piping took ~ 36 seconds.
julia using shell piping took ~ 70 seconds.
julia using GzipDecompressorStream from CodecZlib, which has been recommended to me by more than one julia user, took ~ 110 seconds.
import subprocess
with subprocess.Popen( "pigz -cd ~/in2/swxx/sj/db/ncbi/gene/gene2accession.gz", shell = True, stdout = subprocess.PIPE) as gz:
n1 = 0
for line1 in gz.stdout:
n1 = n1 + 1
print( n1)
julia:
function fun1( file1)
open( `pigz -cd $( expanduser( file1))`) do io
n1 = 0
for line1 in eachline( io)
n1 = n1 + 1
end
print( n1, '\n')
end
end
fun1( "~/in2/swxx/sj/db/ncbi/gene/gene2accession.gz")
julia using GzipDecompressorStream from CodecZlib:
using CodecZlib, TranscodingStreams
function fun1( file1)
io1 = GzipDecompressorStream( open( expanduser( file1)))
n1 = 0
for line1 in eachline( io1)
n1 = n1 + 1
end
print( n1, '\n')
end
fun1( "~/in2/swxx/sj/db/ncbi/gene/gene2accession.gz")
Did you run the Julia code twice to make sure you aren’t taking compilation time in your Julia timings ? That said apparently you’ve not the first one to observe this (e.g. see this discussion on reading fastq.gz files).
If there’s really that big of a difference that’s a big opportunity for improvement, since many file type in bioinformatics comes gzip’ed.