Hi,
use case: large files with a binary structure and embedded “text” elements.
- Depending on the file size, we either read them fully into mem, or mmap them. Hence in both cases, the data are fully available in memory.
- Our files potentially have millions of records
- Because we know the binary format, we know the offset and length of the embedded text element in each record.
- We are leveraging the embedded text elements as dictionary keys (or index) to quickly access records within the file.
We seek an approach to access the embedded text in each record without copying it and without resetting the underlying vector{uint8} (please see here). We don’t need to modify the String (embedded text), and we don’t want to reset the underlying vector.
Since we leverage the embedded text elements as dictionary keys, copying them, consumes a lot of unnessary (and in our case) precious memory.
Is there an approach - maybe even unsafe - that allows us to create a String-like object (like SubString) on top of an underlying Vector{UInt8} without copying it and without resetting the underlying uint8? Our input is an index or pointer, and length (number of bytes), which should be sufficient to create a substring-like object.
many thanks for your help