I have a simulation that’s going to run for a long time, producing a lot of results, and I need those results to go do disk while the sim is running rather than all at once at the end (not enough RAM for all those results). The simulation’s outputs are essentially a bunch of different Vector{Any}
, and elements are added to those arrays at different times. The elements are not primitive types but are structured data, and the structure can even be different between elements. I’m looking for a way to store this stuff, and I don’t feel like I’ve found the right thing yet. I’m looking for suggestions.
Just to illustrate the point, here’s what I’m thinking of putting together. My primary hesitation is just that I feel like this is surely a solved problem, and I must not be searching for the right thing.
On disk, we’d end up with something like this:
[array ID, number of bytes, all the bytes for the serialized element, ...]
So I’d then say, “Hey, give me array ID 117,” and it would make an empty Any[]
. Then it would look through the file. For each array ID of 117, it would see how many bytes to deserialize. It would push the deserialized element into the array. It would then look through the file until it found the next array ID 117.
There’s a lot we could imagine doing here, but this feels like the essence.
One option that I’ve considered is using HDF5. I’d make a dataset for each array ID that’s an array of UInt8s. I’d grow that array over time, dumping in serialized elements in chunks. Storing serialized data on the fly is not exactly what HDF5 is meant for, and I’m concerned this would be tedious, slow, and non-standard.
Another option is to literally implement what I have above from scratch, but if there’s a standard way and existing support, I’d much prefer to use that instead.
Thanks for any tips!