Is the allocation from @time the total at the end, or during a calculation?

As a newcomer to Julia, I am writing some code to work with oceanographic datasets that are fairly large. A test file is 138MiB in size but the instrument that creates the data can emit files up to about 4GiB in size. Part of my work involves filling up a 3D array based on calculations involving another 3D array and a set of vectors. There are, of course, several ways to frame such calculations, and I’ve tried two and run @time on them. Their speeds are about equal, but the allocation reports are very different.

What I’ll call “method 1” has e.g.

0.255215 seconds (1.34 M allocations: 697.901 MiB, 56.25% gc time)

whereas what I’ll call “method 2” has e.g.

0.207432 seconds (195 allocations: 1.783 GiB, 65.91% gc time)

These are tests with this 138M file. My concern is about the allocated memory which would be well over 10GiB for a 4GiB input file. Not everyone using the code will necessarily have that much memory.

However, I don’t exactly know what this number means. I see two possibilities cases:

  1. It is the peak memory required at the most consumptive part of the calculation.
  2. It is the sum of all memory allocated during the process, even if some of it is immediately deallocated.

In case 2, method 2 would be okay even on large files, I think. But in case 1, I definitely want to go with method 1.

So, in a nutshell, my question is: what does the memory stated by @time actually mean?

Note that I’ve not asked yet about the number of allocations. It is huge for method 1 and small for method 2. Is the number something about which I ought to be concerned, apart from time constraints?

It is the second case, the total volume of allocated memory. I have seen outputs of @time totaling hundreds of gigabytes on systems that clearly didn’t have enough RAM

2 Likes

Btw, this looks like a type instability to me because there are very many very small allocations happening. When Julia cannot infer the type statically, it emits code that checks the type at runtime to dispatch to the correct method. This takes small amount of time every time it happens and also causes a couple of small allocations. This per se is not an issue but if it happens in a rather tight inner loop it can totally drag down performance. Maybe if you fix that then this method might perform much better.

Another tip: timing with @time is only good for very rough estimates. For more precise measurements use @btime from BenchmarkTools.jl (built in).

Thanks for those tips!

@profview_allocs is a nice tool in VSCode to check where in your code allocations are happening. You can adjust its sample_rate parameter, it only records a fraction of allocations if there are many, and once you’ve optimized it down to only a couple you can pinpoint them by setting sample_rate=1 which would otherwise be much too slow.

1 Like