I would like to do a piece-wise reduction (on sub-arrays) of a large array that is too big to fit into memory. Is there a ‘storage-based array’ that implements @views?
Are you looking for https://github.com/JuliaArrays/MappedArrays.jl?
Thanks @Oscar_Smith, I’m not really looking for implicit mapping but rather a data structure whose allocation size is limited by storage (not memory) that I can get sub-arrays of. Basically this would allow me to port a lot of existing code to handle some new data.
The Mmap package provides memory-mapped file-based arrays that sound like they should do what you want. Only the portion you are accessing is paged into memory.
(Every AbstractArray subtype supports @views
and subarrays.)
Thanks, @stevengj. Even better that it’s in the standard library.
Hi, @stevengj. For benchmarking the Mmap-based implementation, what is the best way to create some files with large random arrays?
The following (adapted from the docs) obviously doesn’t work because it has to create and write the whole array at once:
n=10
s = open("/tmp/mmap.bin", "w+")
write(s, 10)
write(s, 10^n)
write(s, rand(10, 10^n))
You could just call write
in a loop…, e.g.
open("mmap.bin", "w") do io
write(io, 10^10)
for i = 1:1000
write(io, rand(10^7))
end
end
Thanks!
Is the do
construct required here? I used almost the same for-loop strategy in this post but if fails.
No, but it’s good style when writing a file, since it automatically closes the file at the end of the do...end
even if an exception occurs. It’s equivalent to
io = open(...)
try
...write stuff...
finally
close(io)
end
but is less verbose.
Thanks, @stevengj!
Also I realized that the w+
was what was wrong with my loop, apparently allowing creation on open()
also prevents write()
from appending. when I changed it to w
like yours it worked…