I am using CSV.jl to read a file of this format:
# 60 lines of completely random unformatted crap here
1 1 Te 5S 1 0 0.5 -0.5
1 1 0.0000000000
1 2 0.0024000000
1 3 0.0002413400
2 1 0.0241243140
2 2 0.1234214000
2 2 0.0979007240
2 1 Te 5S 1 0 0.5 0.5
1 1 0.0000000000
1 2 0.0024000000
1 3 0.0002413400
2 1 0.0241243140
2 2 0.1234214000
2 2 0.0979007240
3 1 Bi 5S 1 0 0.5 -0.5
1 1 0.0000000000
1 2 0.0024000000
1 3 0.0002413400
2 1 0.0241243140
2 2 0.1234214000
2 2 0.0979007240
The real file is much larger of course, 1-2 billion lines.
I am only interested in both the “header” line 1 1 Te 5S 1 0 0.5 -0.5
and the third column of each data section.
Currently I am reading the file like this:
function parse(filename)
skip = 61
ndata = 6 # kpts * bands + 1
nsections = 3
for n in 0:nsection-1
header = CSV.File(filename, header=false, datarow=skip+ndata*n, limit=1, ignorerepeated=true, delim=" ");
# Process header
data = CSV.File(filename, header=false, datarow=skip+(ndata+1)*n, limit=ndata, threaded=false, ignorerepeated=true, delim=" ", select=[3]).Column3;
# Process and reshape data
end
end
Of course, this is very inefficient. Is there a better way of going about this? Using Pandas in python I was able to read the whole thing directly then pull out the headers and reshape the remaining data into a 3D array quite easily, but I haven’t had much success doing that in Julia.