Storage-based array that implements @views?

mkarikom · December 31, 2020, 1:47am

I would like to do a piece-wise reduction (on sub-arrays) of a large array that is too big to fit into memory. Is there a ‘storage-based array’ that implements @views?

Oscar_Smith · December 31, 2020, 2:00am

Are you looking for https://github.com/JuliaArrays/MappedArrays.jl?

mkarikom · December 31, 2020, 2:07am

Thanks @Oscar_Smith, I’m not really looking for implicit mapping but rather a data structure whose allocation size is limited by storage (not memory) that I can get sub-arrays of. Basically this would allow me to port a lot of existing code to handle some new data.

stevengj · December 31, 2020, 2:21am

The Mmap package provides memory-mapped file-based arrays that sound like they should do what you want. Only the portion you are accessing is paged into memory.

(Every AbstractArray subtype supports @views and subarrays.)

mkarikom · December 31, 2020, 9:13pm

Thanks, @stevengj. Even better that it’s in the standard library.

mkarikom · December 31, 2020, 9:47pm

Hi, @stevengj. For benchmarking the Mmap-based implementation, what is the best way to create some files with large random arrays?

The following (adapted from the docs) obviously doesn’t work because it has to create and write the whole array at once:

n=10
s = open("/tmp/mmap.bin", "w+")
write(s, 10)
write(s, 10^n)
write(s, rand(10, 10^n))

stevengj · December 31, 2020, 10:28pm

You could just call write in a loop…, e.g.

open("mmap.bin", "w") do io
    write(io, 10^10)
    for i = 1:1000
        write(io, rand(10^7))
    end
end

mkarikom · December 31, 2020, 10:43pm

Thanks!
Is the do construct required here? I used almost the same for-loop strategy in this post but if fails.

stevengj · December 31, 2020, 10:56pm

No, but it’s good style when writing a file, since it automatically closes the file at the end of the do...end even if an exception occurs. It’s equivalent to

io = open(...)
try
    ...write stuff...
finally
    close(io)
end

but is less verbose.

mkarikom · January 1, 2021, 7:41pm

Thanks, @stevengj!

Also I realized that the w+ was what was wrong with my loop, apparently allowing creation on open() also prevents write() from appending. when I changed it to w like yours it worked…

Topic		Replies	Views
How to stream a large random matrix to a file? General Usage question , io	3	845	January 1, 2021
Write to the file the structure of the form (key => value) followed by reading using Mmap.mmap General Usage mmap	12	877	December 16, 2018
Memory usage with mmap file Performance binaryio , garbage-collection , mmap	17	2916	September 16, 2021
Growing mmaped arrays General Usage	6	641	December 17, 2018
Use of Memory-mapped I/O General Usage memory , memory-allocation	9	2909	September 5, 2019

Storage-based array that implements @views?

Related topics