Hi All,
I’m currently trying to read in a huge human-readable file, which can take up to 50% of my total walltime when running a job. I was wondering if there’s a more efficient way to read in such a file?
Naturally, I could always pre-process the file to read in the file and write it to another format, which is more efficient. However, due to the large number of files I have, this is not practical.
The layout of the file is as follows,
- L lines of up to 10 integers per line. For example, I might have 15 integers over 2 lines of 10 and 5, respectively.
- The following line contains a single floating point number
- These L+1 number of lines repeat M times.
My current approach is to pack all the integers over the L lines into a single UInt64 and represent the floating-point number as a Float32.
My current code is this,
p = Progress(num_lines, desc=format("Reading {:>12,d} lines", num_lines), showspeed=true)
for i in 1:num_lines
bitstr = zero(UInt64)
sp_idx = 1
for j in 1:nlines
state_line = readline(io)
state_parts = parse.(Int8, split(state_line))
for sp_index in state_parts
bit_index = sp_index - 1 # Convert to 0-based index
bitstr |= UInt64(1) << bit_index
sp_idx += 1
if sp_idx > num_ints
break
end
end
end
bitstrings[i] = bitstr
coeff_line = readline(io)
coeffs[i] = parse.(Float32, split(coeff_line))[state_idx]
next!(p)
end
Is there a more efficient way to do this?
Any help would be appreciated.