I have a function that generates a number of graphs to extract some features from each, in a for loop. Since the graph is created and the features are extracted, its information is not needed in the next iteration of the loop. However, it seems, that the adjacency matrix’s memory allocation is not freed, resulting in an OOM error when creating multiple large graphs.
function test(n::Int64, e::Int64, samples::Int64)
for i in 1:samples
g = erdos_renyi(n, e)
A = adjacency_matrix(g)
d = degree(g)
end
end
This is the basis of my code. When benchmarking this function, the memory allocation doubles when the iterations (samples) is doubled.
Any help to only allocate memory needed for 1 iteration, would be much appreciated!
Did you check that 1 sample does not cause OOM? In your code there is nothing that should keep the garbage collector from freeing the memory from the last iteration.
function test(n::Int64, e::Int64, samples::Int64)
for i in 1:samples
g = erdos_renyi(n, e)
A = adjacency_matrix(g)
d = degree(g)
end
end
n = 1000
m = 10000
s1 = 10
s2 = 20
test(n, m, s1)
@btime test($n, $m, $s1)
@btime test($n, $m, $s2)
yields the following results:
11.513 ms (50135 allocations: 12.07 MiB)
23.079 ms (100281 allocations: 24.14 MiB)
It makes sense to be some allocations the way the Graphs.jl package works, but I would expect for the allocations to be for 1 iteration of the loop only…
This is expected behaviour. In general, pre-allocating outputs helps (see Performance Tips · The Julia Language). Unfortunately, I’m not aware of how to do that with Graphs.jl.
Regarding the OOM error, the increase of the reported allocations does not imply that something is not freed between iterations. The number is the total memory allocated during the call, not the amount of memory required for the computation.
It seems kind of far-fetched to eliminate the loop from an MWE which is intended to show a problem with multiple iterations, doesn’t it?
Ignoring that fact, there is not a problem that i isn’t used since erdos_renyi produces different random graphs in each iteration. Potentially a sufficiently smart compiler could decide that there is no need to compute A and d and eliminate them itself, but I don’t think Julia can determine that the called functions are free from side effects.
This may be an issue in the package. You can try to minimize the chances of getting a OOM error by calling GC.gc() after each iteration. Yet, if for some reason the reference is not freed inside the package function, this won’t work.
This is another thing. If the function you are calling within the loop allocates, that is expected. But that should not result in a OOM error because GC should be called. In principle this is not a sign of a problem, nor is necessarily related to the problem above.
Hi there @GeoKou, Graphs.jl maintainer speaking. Indeed, Graphs.jl doesn’t offer many opportunities for pre-allocation. But in your simple example, if all you really need is the adjacency matrix of an ER graph, I strongly suggest you generate it by hand, and update it in-place.
@lmiq can you elaborate on what you mean by “reference freed inside the package function”? Not sure how we can improve this on our side
For example if g is mutable and growing at each call to the function, or of some memory is leaking by an improper use of low level unsafe memory management. Yet I think that’s unlikely. I cannot test now, but we would have to confirm that memory is being filled up from iteration to iteration there.
Memory being allocated is normal, but that shouldn’t normally result in OOM erros in the loop if that does not happen in a single call of the function.
Here I don’t see any scale up of the memory. I’m not sure if there isn’t a misconception here. One iteration of that loop allocates ~1.2Mb. If one performs 10 iterations, it allocates ~12Mb, as expected, that seems perfectly consistent.
What may be causing confusion is that maybe the OP thinks that those 12Mb (or greater) are simultaneously allocated, which they are not.
In my tests there is not a build up on memory usage. For instance, with the n and m provided the function uses 3% of the memory of my laptop, independently of the number of iterations.
Maybe the OP has seen a OOM error for systems that are too large for the computer memory to handle a single (or very few iterations), such that garbage collection does not run often enough. That is probably solved by adding GC.gc() to the end of the iteration, if the computer runs one iteration fine.
This was my first thought, from what I’ve read I was expecting the memory to be freed before an OOM occurs. However, I tried many times without the GC.gc() command and the same OOM error occurs, whereas it is OK when the GC.gc() command is called.
This should not be the required solution. If one needs to do this, something is going wrong inside Julia. Would you mind posting an issue on Julia’s GitHub issue tracker, mentioning the code in detail so that the devs can reproduce the issue?