As a newcomer to Julia, I am writing some code to work with oceanographic datasets that are fairly large. A test file is 138MiB in size but the instrument that creates the data can emit files up to about 4GiB in size. Part of my work involves filling up a 3D array based on calculations involving another 3D array and a set of vectors. There are, of course, several ways to frame such calculations, and I’ve tried two and run @time on them. Their speeds are about equal, but the allocation reports are very different.
What I’ll call “method 1” has e.g.
0.255215 seconds (1.34 M allocations: 697.901 MiB, 56.25% gc time)
whereas what I’ll call “method 2” has e.g.
0.207432 seconds (195 allocations: 1.783 GiB, 65.91% gc time)
These are tests with this 138M file. My concern is about the allocated memory which would be well over 10GiB for a 4GiB input file. Not everyone using the code will necessarily have that much memory.
However, I don’t exactly know what this number means. I see two possibilities cases:
- It is the peak memory required at the most consumptive part of the calculation.
- It is the sum of all memory allocated during the process, even if some of it is immediately deallocated.
In case 2, method 2 would be okay even on large files, I think. But in case 1, I definitely want to go with method 1.
So, in a nutshell, my question is: what does the memory stated by @time actually mean?
Note that I’ve not asked yet about the number of allocations. It is huge for method 1 and small for method 2. Is the number something about which I ought to be concerned, apart from time constraints?