Memory Allocation Increases with Iterations

Hello,

I have a function that generates a number of graphs to extract some features from each, in a for loop. Since the graph is created and the features are extracted, its information is not needed in the next iteration of the loop. However, it seems, that the adjacency matrix’s memory allocation is not freed, resulting in an OOM error when creating multiple large graphs.

function test(n::Int64, e::Int64, samples::Int64)
    for i in 1:samples
        g = erdos_renyi(n, e)
        A = adjacency_matrix(g)
        d = degree(g)
    end
end

This is the basis of my code. When benchmarking this function, the memory allocation doubles when the iterations (samples) is doubled.

Any help to only allocate memory needed for 1 iteration, would be much appreciated!

Thank you in advance.

Is the OOM only happening in a shared cluster environment? If that is the case then Garbage collection not aggressive enough on Slurm Cluster might be relevant.

Did you check that 1 sample does not cause OOM? In your code there is nothing that should keep the garbage collector from freeing the memory from the last iteration.

Hey, thank you for the response.

The code is run locally on my laptop.

Running the following:

function test(n::Int64, e::Int64, samples::Int64)
    for i in 1:samples
        g = erdos_renyi(n, e)
        A = adjacency_matrix(g)
        d = degree(g)
    end
end
n = 1000
m = 10000
s1 = 10
s2 = 20

test(n, m, s1)
@btime test($n, $m, $s1)
@btime test($n, $m, $s2)

yields the following results:

  11.513 ms (50135 allocations: 12.07 MiB)
  23.079 ms (100281 allocations: 24.14 MiB)

It makes sense to be some allocations the way the Graphs.jl package works, but I would expect for the allocations to be for 1 iteration of the loop only…

This is expected behaviour. In general, pre-allocating outputs helps (see Performance Tips · The Julia Language). Unfortunately, I’m not aware of how to do that with Graphs.jl.

Regarding the OOM error, the increase of the reported allocations does not imply that something is not freed between iterations. The number is the total memory allocated during the call, not the amount of memory required for the computation.

In your MWE the variable i isn’t used in the for loop. Looks to me like you should able to simply eliminate the loop?

It seems kind of far-fetched to eliminate the loop from an MWE which is intended to show a problem with multiple iterations, doesn’t it?

Ignoring that fact, there is not a problem that i isn’t used since erdos_renyi produces different random graphs in each iteration. Potentially a sufficiently smart compiler could decide that there is no need to compute A and d and eliminate them itself, but I don’t think Julia can determine that the called functions are free from side effects.

Ouch, so

is the problem as far as we can tell for now?

Seems that two things are being mixed here:

This may be an issue in the package. You can try to minimize the chances of getting a OOM error by calling GC.gc() after each iteration. Yet, if for some reason the reference is not freed inside the package function, this won’t work.

This is another thing. If the function you are calling within the loop allocates, that is expected. But that should not result in a OOM error because GC should be called. In principle this is not a sign of a problem, nor is necessarily related to the problem above.

Hi there @GeoKou, Graphs.jl maintainer speaking. Indeed, Graphs.jl doesn’t offer many opportunities for pre-allocation. But in your simple example, if all you really need is the adjacency matrix of an ER graph, I strongly suggest you generate it by hand, and update it in-place.

@lmiq can you elaborate on what you mean by “reference freed inside the package function”? Not sure how we can improve this on our side

For example if g is mutable and growing at each call to the function, or of some memory is leaking by an improper use of low level unsafe memory management. Yet I think that’s unlikely. I cannot test now, but we would have to confirm that memory is being filled up from iteration to iteration there.

Memory being allocated is normal, but that shouldn’t normally result in OOM erros in the loop if that does not happen in a single call of the function.

1 Like

I also think that’s unlikely, but then the cause of the OOM remains elusive

Here I don’t see any scale up of the memory. I’m not sure if there isn’t a misconception here. One iteration of that loop allocates ~1.2Mb. If one performs 10 iterations, it allocates ~12Mb, as expected, that seems perfectly consistent.

What may be causing confusion is that maybe the OP thinks that those 12Mb (or greater) are simultaneously allocated, which they are not.

In my tests there is not a build up on memory usage. For instance, with the n and m provided the function uses 3% of the memory of my laptop, independently of the number of iterations.

Maybe the OP has seen a OOM error for systems that are too large for the computer memory to handle a single (or very few iterations), such that garbage collection does not run often enough. That is probably solved by adding GC.gc() to the end of the iteration, if the computer runs one iteration fine.

1 Like

Hello all,

Thank you for the replies and the insight on this issue.

By calling the garbage collector manually (GC.gc()) the problem was fixed, and after many tests, there was no OOM error.

this really shouldn’t happen. it sounds like a bug where we aren’t triggering GC when we should.

1 Like

This was my first thought, from what I’ve read I was expecting the memory to be freed before an OOM occurs. However, I tried many times without the GC.gc() command and the same OOM error occurs, whereas it is OK when the GC.gc() command is called.

1 Like

This should not be the required solution. If one needs to do this, something is going wrong inside Julia. Would you mind posting an issue on Julia’s GitHub issue tracker, mentioning the code in detail so that the devs can reproduce the issue?

2 Likes