I am developing some code in BEDFIles.jl
that may be used with very large binary data files. The data are accessed as a read-only memory-mapped Matrix{UInt8}
using column-oriented algorithms whenever possible.
As I understand it, there shouldn’t be a problem with having a very large file if I am only accessing a small set of adjacent columns. Suppose that I have 100,000 rows and 10 million columns but I only access the first 10,000 columns. I believe that the columns beyond 10,000 will never need to appear in memory - that they are essentially held as a kind of a promise by the operating system (which would be Linux - I don’t care if Windows does dumb things with memory-mapped files). Is this correct?