Storage-based array that implements @views?

I would like to do a piece-wise reduction (on sub-arrays) of a large array that is too big to fit into memory. Is there a ‘storage-based array’ that implements @views?

Are you looking for https://github.com/JuliaArrays/MappedArrays.jl?

1 Like

Thanks @Oscar_Smith, I’m not really looking for implicit mapping but rather a data structure whose allocation size is limited by storage (not memory) that I can get sub-arrays of. Basically this would allow me to port a lot of existing code to handle some new data.

The Mmap package provides memory-mapped file-based arrays that sound like they should do what you want. Only the portion you are accessing is paged into memory.

(Every AbstractArray subtype supports @views and subarrays.)

6 Likes

Thanks, @stevengj. Even better that it’s in the standard library.

Hi, @stevengj. For benchmarking the Mmap-based implementation, what is the best way to create some files with large random arrays?

The following (adapted from the docs) obviously doesn’t work because it has to create and write the whole array at once:

n=10
s = open("/tmp/mmap.bin", "w+")
write(s, 10)
write(s, 10^n)
write(s, rand(10, 10^n))

You could just call write in a loop…, e.g.

open("mmap.bin", "w") do io
    write(io, 10^10)
    for i = 1:1000
        write(io, rand(10^7))
    end
end
2 Likes

Thanks!
Is the do construct required here? I used almost the same for-loop strategy in this post but if fails.

No, but it’s good style when writing a file, since it automatically closes the file at the end of the do...end even if an exception occurs. It’s equivalent to

io = open(...)
try
    ...write stuff...
finally
    close(io)
end

but is less verbose.

5 Likes

Thanks, @stevengj!

Also I realized that the w+ was what was wrong with my loop, apparently allowing creation on open() also prevents write() from appending. when I changed it to w like yours it worked…