Memory hogging loops

jrhalket · October 21, 2022, 4:24pm

Hi all, I’m still trying to figure out how to get around a familiar problem regarding repeated calls of a function causing nearly linearly increases in memory demands. Here’s a minimal example. I’m not really interested in reducing the memory of a single call of this function - i’ve just written it to explicitly use a lot of memory. What I’m interested in is how to write repeated calls to a function without the memory demand scaling so much.

using Distributions
using Random


function costlyfn(_i)
    Random.seed!(1234+_i)    
    bigmat = rand(Normal(0.0,1.0), (1000,1000))
    return  inv(bigmat)
end

@time  sim_dat = [costlyfn(314159+i) for i in 1:1];
 0.122634 seconds (44.13 k allocations: 18.101 MiB, 21.55% compilation time)

@time  sim_dat = [costlyfn(314159+i) for i in 1:100];
 8.453335 seconds (45.51 k allocations: 1.541 GiB, 4.67% gc time, 0.39% compilation time)

I’m chiefly interested in reducing a lot that 1.5GiB number without having to reduce the 18.1MiB from a single call. Any help greatly appreciated.

rdeits · October 21, 2022, 4:44pm

Note that your code isn’t necessarily requiring 1.5GiB at any one time. The number reported there is just the total amount of memory allocated (even if that memory is immediately freed). That’s why it’s just ~100 times the memory of a single iteration, because that’s exactly what the number measures.

That said, the answer is to pre-allocate outside of the loop. You can create a 1000x1000 matrix outside of the loop and change your code to costlyfun!(mat, _i), using rand!(mat, ...) to write the random data directly into that existing matrix. That will result in only a single allocation of the matrix across all of your loops, rather than one allocation per iteration.

rdeits · October 21, 2022, 4:46pm

A pattern that I find convenient is to define the “core” computation using some pre-allocated storage:

function foo!(mat, x)
  mat .= x
end

and then create a convenient version of that function which doesn’t require a pre-allocated storage, just for testing or demonstration purposes:

function foo(x)
  mat = Matrix{Float64}(undef, 10, 10)
  foo!(mat, x)  # Call the in-place version which does the actual work
end

In an expensive loop you’d use foo!(mat, x), but if you just wanted to test something or play around with the code you can use the more convenient foo(x)

jrhalket · October 21, 2022, 5:11pm

So if I edit to:

using Distributions
using Random

bigmat = Array{Float64,2}(undef,1000,1000);

function costlyfn!(bigmat,_i)
    Random.seed!(1234+_i)    
    rand!(Normal(0.0,1.0),bigmat)
    return  inv(bigmat)
end

and compare:

@time  sim_dat = [costlyfn!(bigmat,314159+i) for i in 1:1];
0.138958 seconds (48.85 k allocations: 10.788 MiB, 33.36% compilation time)

@time  sim_dat = [costlyfn!(bigmat,314159+i) for i in 1:100];
  7.821771 seconds (50.14 k allocations: 815.269 MiB, 2.81% gc time, 0.61% compilation time)

Pre-allocation reduces the memory demands of a single call but hardly reduces the scaling problem.

Regarding the total amount used vs allocated: the reason I’m so worried about this problem is that I am eventually going to run my actual code on an HPC. Asking for huge memory allocations can cause problems.

DNF · October 21, 2022, 5:35pm

rand is only half the allocations. inv also allocates.

mikmoore · October 21, 2022, 6:23pm

julia> foo(n) = sum(_->sum(ones(10^3)),1:n)
foo (generic function with 2 methods)

julia> @time foo(10)
  0.000013 seconds (10 allocations: 79.375 KiB)
10000.0

julia> @time foo(50_000_000)
 42.659915 seconds (50.00 M allocations: 378.489 GiB, 13.02% gc time)
5.0e10

I assure you that my computer does not have 378GiB of memory. But clearly the program can run anyway. As others have said, this is not the peak usage but the cumulative. It will absolutely scale linearly with the number of times you call an allocating function and there isn’t any way around that (except to make the function not allocate). This is no different than repeatedly calling malloc and free in a loop in C. The cumulative number of bytes allocated can be huge while the instantaneous usage remains low. The 13.02% gc time in the report indicates that the garbage collector spent time reclaiming unused memory during this call, which is why my computer didn’t kill Julia long before it hit 378GiB.

jrhalket · October 21, 2022, 6:24pm

Very fair. Is there a way of seeing how much memory it actually uses?

stillyslalom · October 21, 2022, 6:26pm

There are very few cases for which inv is actually what you want in high-performance settings. It’s slower and more sensitive to conditioning than any specialized factorization. You’ll probably want to look at in-place factorizations (qr!, cholesky!, lu!, … depending on your matrix) and ldiv! if you want to minimize allocations in the loop.

Oscar_Smith · October 21, 2022, 6:35pm

There isn’t going to be a consistent answer to this. It depends on if and when the GC decides to run.

mikmoore · October 21, 2022, 6:43pm

There isn’t a way I’m aware of (within Julia). There might be some way, but it won’t be ergonomic. Your operating system can probably provide the number in an activity monitor, but that doesn’t sound like what you’re after.

Julia is garbage collected so it might take a little while to free up unused memory. The garbage collected can run any time you cause an allocation (but won’t run that often in practice unless you’re almost out of memory). It can also be manually invoked it via GC.gc(), but this isn’t necessary.

That said, in general Julia won’t use that much more than is strictly necessary. Any allocated memory that is no longer reachable from a “live” variable will be reclaimed by the GC on the next full run (and possibly on the next partial run).

Most of us do not find that we need to pay any special heed to instantaneous memory use beyond the simple rules like not allocating an array with a zillion elements (or zillions of elements among many objects that we will keep using).

stillyslalom · October 21, 2022, 6:50pm

e.g.

julia> function lesscostlyfn!(out,bigmat,_i)
           Random.seed!(1234+_i)
           out .= I(size(bigmat, 1))
           rand!(Normal(0.0, 1.0), bigmat)
           ldiv!(lu!(bigmat), out)
       end

julia> @btime map(i -> costlyfn(i), 1:100);
  6.401 s (1401 allocations: 1.54 GiB)

julia> @btime map(i -> lesscostlyfn!($out, $bigmat, 314159+i), 1:100);
  4.140 s (901 allocations: 947.75 KiB)

The majority of the allocations are from seed! (~7 allocs per call), and lu! still allocates a pivot vector, but that’s still a solid decrease in runtime by trimming allocations.

fins · October 21, 2022, 7:04pm

Very fair. Is there a way of seeing how much memory it actually uses?

I think the answer very much depends on what you mean by “how much memory ‘it’ uses”. In case you mean…

the memory, costlyfn(...) allocates, temporarily, per call, to be picked up, by the gc, as soon, as it is done? => that is what the benchmark-macros tell you, if you call the function once, with one of those macros. Or do you mean…
the total amount of memory, which got used, by the function, in one way or another (being read from and or written to), regardless of the question, whether it had been allocated, before or not (and might stay allocated, even)? => That throws a lot of stuff in one big pot and is better dissected one-by-one, as it is already confusing, what we’re talking about. Or…
the max amount of memory of your computer, that is being allocated at any point in time and hasn’t been gc’d, yet? => that’s a non-trivial question (i.e. it can vary a lot), but there is some option, to log all julia - gc-actions:

julia> GC.enable_logging(true)

julia> @time rand(100_000_000) # 100M Floats x 8 Byte = 800 MB ~ 763 MiB = 800.000.000 / (1024*1024)
GC: pause 16.97ms. collected 18.818564MB. incr 
  0.306319 seconds (2 allocations: 762.939 MiB, 5.54% gc time)
GC: pause 22.30ms. collected 43.343791MB. incr 
100000000-element Vector{Float64}:
 0.4974772179862581
 0.5568194506696545
 0.08167221276873848
...

jrhalket · October 21, 2022, 9:08pm

ok thanks. This is starting to make sense.
Alas for some reason when I call GC.enable_logging(true), it tells me that enable_logging is not defined.

fins · October 21, 2022, 9:27pm

Hmmm, weird, I don’t remember, doing anything to access it, in particular, except calling.

julia> @which GC.enable_logging()
enable_logging() in Base.GC at gcutils.jl:205

Oscar_Smith · October 21, 2022, 9:28pm

I believe GC.enable_logging was only added in 1.8

jrhalket · October 21, 2022, 9:44pm

ah ok. that version is not yet compatible with vscode, correct?

jar1 · October 21, 2022, 9:47pm

1.8 works in vscode

fins · October 21, 2022, 9:54pm

ah ok. that version is not yet compatible with vscode, correct?

Everything I’ve copied here is from the same vs-code - REPL - session, including this…

julia> versioninfo()
Julia Version 1.8.2
Commit 36034abf26 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × AMD Ryzen 5 3550H with Radeon Vega Mobile Gfx
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver1)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_EDITOR = code

nilshg · October 22, 2022, 6:32am

At this point the Julia VSCode extension is the closest thing you get to an “official” IDE for the Julia language so you can pretty much count on it working with the latest stable release at all times.

Topic		Replies	Views
Help to reduce number of allocations General Usage	23	5490	January 6, 2019
Huge memory allocation New to Julia array , memory-allocation	17	1111	January 25, 2024
Why so large memory allocations? Performance memory-allocation	6	841	February 3, 2022
Memory allocation within loop Performance question	7	745	June 28, 2020
Why fewer memory allocations does not necessarily suggest higher speed New to Julia performance , memory-allocation	5	792	June 6, 2021

Memory hogging loops

Related topics