I have a (somewhat old) file that was created from a Fortran script and contains a struct without any padding. I try to read the file as follows
struct Foo
id::UInt32
Value::Int64
end
io = open(filepath, "r")
rows = Mmap.mmap(io, Vector{Foo}, nrows)
sa = StructArray(rows)
However, because the default behavior is to add padding between id and Value the size of Foo is wrong and I get nonsense.
Is there a way to disable the padding?
My main concern here is to optimize loading the file into a StructArray, as the files are ~10GB and I need to load hundreds of them in my computations while also query the entire files. If there is a better way to load them I’d be happy to learn.
I’m not sure about padding in vectors, but could you do:
const T = NTuple{12, UInt8}
rows = Mmap.mmap(io, Vector{T}, nrows)
then you can reinterpret the UInt8 tuples:
a = rows[147]
reinterpret(UInt32, a[1:4])
reinterpret(Int64, a[5:end])
That is, something like this:
using Mmap
using StructArrays
const T = NTuple{12, UInt8}
function getdata(filename)
open(filename, "r") do io
rows = Mmap.mmap(io, Vector{T}, 1_000_000_000)
a = map(x -> reinterpret(UInt32, x[1:4]), rows)
b = map(x -> reinterpret(Int64, x[5:12]), rows)
StructArray(a=a, b=b)
end
end
sa = getdata("/tmp/fort1.dat")