Read-only memory-mapped files

dmbates · July 2, 2018, 5:04pm

I am developing some code in BEDFIles.jl that may be used with very large binary data files. The data are accessed as a read-only memory-mapped Matrix{UInt8} using column-oriented algorithms whenever possible.

As I understand it, there shouldn’t be a problem with having a very large file if I am only accessing a small set of adjacent columns. Suppose that I have 100,000 rows and 10 million columns but I only access the first 10,000 columns. I believe that the columns beyond 10,000 will never need to appear in memory - that they are essentially held as a kind of a promise by the operating system (which would be Linux - I don’t care if Windows does dumb things with memory-mapped files). Is this correct?

ExpandingMan · July 2, 2018, 5:33pm

If you are opening a memory mapped file, yes that is correct.

Admittedly I am still rather hazy on some of the details, but you can get a partial description here.

It should go without saying that you still have to be careful about actually copying data out of the memory mapped array.

Tamas_Papp · July 3, 2018, 7:08am

Yes, your expectations are correct. I have mmapped 500GB files on a 16GB machine without any problems, the OS (in my case, Linux) takes care of the memory operations very transparently, paging on demand.

For 10^5\cdot10^4=10^9 UInt8s, that’s 1GB, so chances are the whole section could just fit in memory, making access really fast.

Topic		Replies	Views
Memory usage with mmap file Performance binaryio , garbage-collection , mmap	17	3022	September 16, 2021
Writing an array too large to store in memory Performance question , jld	2	2230	March 16, 2018
Use of Memory-mapped I/O General Usage memory , memory-allocation	9	2982	September 5, 2019
I need advices to solve Ax=b with Mmap New to Julia mmap	16	1641	September 13, 2019
Mmapping a discontiguous file? Visualization question	4	1168	August 26, 2017

Read-only memory-mapped files

Related topics