I have binary files (IAEA phase space files) which contain large contiguous lists of particles. And I want to mmap these files.
Here is a simplified variant of my issue: A particle consists of 5 bytes, where the first byte encodes the type UInt8 and the other bytes encode the energy Float32. Now one way to mimic this in julia would be:
struct Particle
type::UInt8
energy::Float32
end
However there is a problem. sizeof(Particle) = 8, while the particles on disk are only 5 bytes. This means I cannot mmap them. Are there recommended solutions to this?
My idea was to instead do:
struct Particle
bytes::NTuple{5,UInt8}
end
and then reinterpret chunks of the bytes field. However one problem I have is that reinterpret is quite picky:
Only UnalignedVectors.jl package dates from 0.6, and has not been updated to 1.0. Probably this has made it easier now:
reinterpret now works on any AbstractArray using the new ReinterpretArray type.
This supersedes the old behavior of reinterpret on Arrays. As a result, reinterpreting
arrays with different alignment requirements (removed in 0.6) is once again allowed (https://github.com/JuliaLang/julia/pull/23750).
One solution is to create a primitive type Iaea40 covering the 40 bits of information and manually implementing the interface needed to reinterpret it as a Particle (with which you do the processing). So, from memory-mapped Matrix{Iaea40} the getters and setters would do the conversion to and from Particle i.e. reinterpret(Particle, Iaea40). I never tried such an approach so this is somewhat theoretical but it should work
I am a bit confused. Is an IAEA record always 5 bytes long ? If the record size is fixed to any number of bytes, it can be memory mapped to a Matrix parametrized by a custom defined primitive type. If the record size varies, you can still memory map it to a BitArray and index into it. This is the only way one can have a easily addressable/indexable representation of out-of-core objects unless you handcraft your own.
Yes, the record length varies, but not within a single file. Roughly the situation is this: The IAEA files come in pairs. There is one human readable .IAEAheader and a binary .IAEAphsp file. The header describes, which particle properties are recorded in the .IAEAphsp file. For instance it is quite common, that all particles have the same z position. Usually in this case to save memory z is only stored once in the header instead of repeating it again and again for each particle.