Some tweaks about binary I/O plus some conversions

Background

The genotype data can be stored like:
01212
where, 0 and 2 are homozygotes, 1 is the heterozyges. Frequently they will be read into a Float64 matrix for some dot operations and GEMM things. My question is that if there is a faster way to read them as a float matrix than the bottom one I used. It is still cost 5+ times more time than the C++ codes.

Some binary operations

Binary I/O in C++

// write double array to binary
vector<double> dat;

// output
cout.write(reinterpret_cast<char*>(&dat[0]), dat.size()*sizeof(double));
// input
double g[len];
cin.read(reinterpret_cast<char*>(g), len * sizeof(double));

The stdin/out can be of course changed to other streams. But they are very convenient to be redirected to or from other streams.

Binary I/O in Julia

# Output
# -- Suppose dat is an array of Float64
write(file, dat)
## Input
reinterpret(Float64, read(file))

Some test

Test codes for writing

# write.jl
@time open("x", "w") do io
    for i in 1:1000
        for j in 1:600000
            print(io, rand(0:2))
        end
        print(io, '\n')
    end
end
//write.cpp
#include <iostream>
#include <random>
using namespace std;
int main(int argc, char *argv[])
{
  ios_base::sync_with_stdio(false);
  mt19937 rng;
  uniform_int_distribution<int> uni(0,2);
  for(auto i{0}; i<1000; ++i){
    for(auto j{0}; j<600000; ++j) cout<<uni(rng);
    cout<<'\n';
  }
  return 0;
}

Test results

g++ -O2 -Wall -std=c++17 write.cpp
time ./a.out >y

# real    0m22.186s
# user    0m19.984s
# sys     0m0.304s

time Julia write.jl
# 52.034295 seconds (1.20 G allocations: 53.649 GiB, 0.91% gc time)

#real    0m52.155s
#user    0m50.501s
#sys     0m1.537s

Test codes for reading

Note, I will convert those 012 to double for later GEMM.

# read.jl
gt = Float64[]
@time for line in eachline("x")
	for c in line
		push!(gt, Float64('c'-'0'))
	end
end
#include <iostream>
#include <vector>
using namespace std;
int main(int argc, char *argv[])
{
  ios_base::sync_with_stdio(false);
  vector<double> gt;
  for(string line; getline(cin, line);)
    for(const auto&c:line) gt.push_back(c-'0');

  return 0;
}
time Julia read.jl
# 17.118777 seconds (12.37 k allocations: 5.692 GiB, 1.53% gc time)

#real    0m17.346s
#user    0m10.429s
#sys     0m7.269s

gpp -O2 -Wall -std=c++17 read.cpp
time ./a.out <y

#real    0m3.263s
#user    0m0.894s
#sys     0m2.330s

Above can be much better if not to use vector, but to deal the file line by line. The latter can increase the speed of the c++ codes by 8~9 times, meaning it is tens of times faster than the Julia codes.

Use C++ to convert to double stream then read by Julia

// The converter
#include <iostream>

using namespace std;

int main(int argc, char *argv[])
{
  ios_base::sync_with_stdio(false);
  for(string line; getline(cin, line);){
    size_t len(line.length());
    double t[len];
    for(size_t i{0}; i<len; ++i) t[i] = line[i]-'0';
    cout.write(reinterpret_cast<char*>(t), sizeof(double)*len);
  }
  return 0;
}
@time t = reinterpret(Float64, read(pipeline("y", `./a.out`)))
# 11.728569 seconds (1.05 M allocations: 9.061 GiB, 2.63% gc time)

The time reduced from 17s to 12s, meaning it is worth of the effort.

You appear to be benchmarking Julia in the global scope. If you instead use a function eg

function write()
    open("x", "w") do io
		for i in 1:1000
			for j in 1:600000
				print(io, rand(0:2))
			end
			print(io, '\n')
		end
    end
end

you should see much better results.

I got almost the same answer. But

function twrite()
    open("c", "w") do io
        for i in 1:1000
            print(io, join(rand(0:2, 600000)))
            print(io, '\n')
        end
    end
end

decreased the time by around 20%. It’s still using double the time of the C++ codes.

open("x", "w") do io
	b = Vector{UInt8}(undef, 128*1024)
	o = 1
	for i in 1:1000
		for j in 1:600000
			@inbounds b[o] = 0x30 + rand(0:2)
			if o == 128*1024
				write(io, b)
				o = 0
			end
			o += 1
		end
		if o > 1
			write(io, view(b, 1:o-1))
			o = 1
		end
	        write(io, Vector{UInt8}("\n"))
	end
end

On my machine I get:

[pixel@manjaro ~]$ time julia write.jl 

real 0m12.919s
user 0m11.190s
sys  0m0.583s

vs:

[pixel@manjaro ~]$ time ./a.out > y

real 0m51.771s
user 0m45.953s
sys  0m2.162s

Edit: I’m writing to a ZFS partition using lz4 compression, so I went with the biggest record size at 128k.

1 Like

Quite impressive. If writing c++ this way, and compile the codes with -O3, Julia version is almost as fast as the c++ one:

//g++ -O3 -std=c++17 -Wall below.cpp
//./a.out >y
#include <iostream>
#include <random>
using namespace std;
int main(int argc, char *argv[])
{
  ios_base::sync_with_stdio(false);
  mt19937 rng;
  uniform_int_distribution<int> uni(0,2);
  char t[600000];
  
  for(auto i{0}; i<1000; ++i){
    for(auto&x:t) x = 48+uni(rng);
    cout.write(t, 600000);
    cout<<'\n';
  }
  return 0;
}

Reading back these strings into Float64[] should have similar speed too, I think.

Actually it is much faster to read than to write on a disk.