I need to generate a very large matrix (too big for memory), so I’m using memory mapped arrays.
I’m trying to benchmark the code and understand some of the variation in run times. Right now, I’m looking at the time it takes to write the files to disk so I’m generating random numbers to test the write times.
I generate the random numbers, save them as a JLD2 file (timed), and also insert them into a Memory Mapped array (timed). I know this isn’t an entirely fair comparison, but need something to compare the memory mapped times to.
To give you an idea of how I’m benchmarking, it looks something like:
for i = 1:101 write = rand(200, 510, 61, 6, 11) @save string("test_",i,".jld2") write mmap_array[:,:,:,:,:,i] = write end
The results of the timing are:
average jld write time: 11.469541902542113
std dev of jld write times: 7.168669045766888
average mmap write time: 23.152635397911073
std dev of mmap write times: 15.751597801816262
(I drop the values from the first iteration due to the added compile times, so the sample size is 100.)
I should also note the mmap case had fewer allocations. Here is the @time of the last iterations.
save JLD2 file 11.887770 seconds (82 allocations: 7.891 KiB) insert array into mmap array 17.974852 seconds (49 allocations: 1.516 KiB)
Surprisingly (at least to me), the JLD2 writing is much, much faster. I could understand it being a little bit faster because there are fewer numbers it has to deal with; however, I still expected the memory mapped array to be faster because the file is already there and it is memory mapped.
I’d also note that the JLD2 write time is much less variable.
Can anyone offer any insight on this? My supposition is that because the JLD2 file is being written to disk via a function it is happening faster.
Beyond this (possibly unfair) comparison, is there a way to speed up the write time of the memory mapped array? At a minimum this comparison demonstrates that my computer could get the values to disk faster. (I say my computer, but tests were run on a linux cluster so this wouldn’t be do to varying usage of CPU power. The cluster also uses a Lustre file system.)