How could I force the release of the memory allocated by X here?
(dimensions chosen to bomb if the memory allocated by X does not get freed)
function testme()
X = rand( 14_000_000_000 )
Y = sum( X )
X = nothing
GC.gc() # why does the memory not get freed here?
Z = rand( 14_000_000_000 )
Y += sum( Z )
return Y
end
function tester()
Y = testme()
return Y
end
println( tester() )
The following seems to release memory for the GC.gc() call at the toplevel
function testme()
X = rand( 2_000_000_000 )
Y = sum( X )
X = nothing
GC.gc() # why does the memory not get freed here?
Z = rand( 2_000_000_000 )
Y += sum( Z )
return Y
end
function tester()
Y = testme()
return Y
end
println( tester() )
GC.gc()
as visible in Task Manager. So is it a scope problem?
Yes, there are a number of changes that one could make that would make the memory get released, but I would like to understand more fundamentally what would work, what wouldn’t, and why.
(Other examples include sticking @sync in front of GC.gc() and calling garbage collection in the calling function)
Thanks @Ronis_BR . Unfortunately, in my actual program there are a lot of other objects that would be defined within your inner scope that I would want to survive. Returning all that stuff is ugly.
Looking at the result of @code_typed testme(), it doesn’t appear that X = nothing actually does anything. It seems that it is removed by the optimizer because (aside from GC) it has no observable effect.
@aviatesk is this something that EA could help with? Or is this more an issue of optimization not recognizing (intentionally or accidentally) that X = nothing has an effect w.r.t GC?
function test_me1()
X = rand(1_000_000_00)
Y = sum(X)
X = nothing
return Y
end
function test_me2(Y)
Z = rand(1_000_000_00)
Y += sum(Z)
return Y
end
function testme()
Y = test_me1()
GC.gc() # memory freed
Y = test_me2(Y)
GC.gc() # memory freed
return Y
end
function tester()
Y = testme()
return Y
end
tester()
My question is: can we document something about these behaviours inside the docstring of GC.gc() or is everything an implementation detail?
If you need to do something like this, allocate a lot, that gets thrown away in your function or loop, then you should use Bumper.jl. It’s made for doing it automatically for you (Mojo does similar automatically, and I hope Julia could at some point do similar optimization).
You can’t in general force the GC to free objects, i.e. with:
X = nothing
GC.gc() # why does the memory not get freed here?
because in general something else might point to that object. With all the context you have that’s known not to be the case, so Julia could do it automatically, or the tool I proposed manually. If you want to do it with the GC then it would be inefficient.
You can do GC.gc(true) for a full collection, but it’s very much not advice, since very slow. I think it’s mostly for benchmarking and some exceptional hard-real-time situations. Without true there, you do an incremental GC, and with the best possible GC implementation (it keeps changing) it should get rid of your immediate garbage you generated, but it still means not only that last allocation you made, and it would be wasteful compared to other means I proposed.
[The X you want to get rid if was heap allocated, and a pointer to it was on the stack. In some ideal world you could have a GC that only looked at the stack (very quick), or even just from your stack-frame, and only drop from the heap what’s pointed to from there. But the GC isn’t allowed to assume that nothing else pointed to your object. Besides with the X = nothing I believe you destroyed the pointer in your stack frame, making it hard for the GC to know what you wanted. But ironically that was your hint to the GC… In languages without GC, that would have been Libc.free implicitly or explicitly. Bumper.jl does similar, except it doesn’t even need to drop from the heap, or use it at all, since it has its own stack, that replaces the heap in this situation.]
What you could try if you want to insist on using the GC, and not true for a full collection, is (I’m not sure why BenchmarkTool does it, it seems excessive, and relying on the implementation of the GC, I’m not even sure this is still the best or correct way):
Thanks for pointing out Bumper.jl, it seems very cool.
But actually:
help?> GC.gc()
GC.gc([full=true])
Perform garbage collection. The argument full determines the kind of collection: A full collection (default) sweeps
all objects, which makes the next GC scan much slower, while an incremental collection may only sweep so-called
young objects.
│ Warning
│
│ Excessive use will likely lead to poor performance.
So GC.gc() is equivalent to GC.gc(true).
Also, even if I saw the tip many times about using several calls to GC.gc() (the trick used in BenchmarkTools) to force the GC to free the memory, it doesn’t actually work in the case the OP proposed. I tried it and it didn’t work.
You’re right, true is actually the default, so it makes it’s puzzling for me why 4 in a row. I.e. “full” implies not full? It’s implied by the docs that the next GC is needed, the one that will be “much slower”, maybe 4, maybe more? Possibly some fixed amount 4 (or lower) always works if allocations are sufficiently small, but no about works if too large.
I believe Julia (still) uses mark-and-sweep GC (but it’s also generational). I forget what needs to happen to actually get rid of garbage; mark then sweep, but do you only trigger sweep manually?
I don’t recall exactly, but I think your problem might be the large object. I.e. small and large are handled differently. Small once take up cache space, and it’s good to deallocate quickly, incrementally, so that you do not accumulate garbage quickly.
It’s assumed that large object stay around for longer, why they are handled differently by GC. They do take up cache and VM space too (note freeing from the heap doesn’t free from the cache directly, nor makes the VM smaller I believe), but I’m curious, why do you worry at all if the GC works on them at all? It might not be an actual problem.
The point of GC is that you do not have to do free, to simplify programming, but also to defer freeing, so that it can be done in bulk. It may not be optimal to do it as quickly as possibly could happen. That is the reason GC language can in some cases be faster then non-GC ones.
If you do it with Bumper.jl though I believe there’s no trade-off, it should just work. It should also work on Windows. It’s less tested there, but the only worry is if you run out of memory, and that would be very obvious. Linux might handle overcommit better…
I hope, and have high confidence, that this hint option will get redundant though soon. Users should have to and don’t need to worry about max. memory in most cases. They may need to worry about finalizers in some cases; and memory mapped files relating to that, which I believe was your original problem (not show in this example); though you didn’t use such directly, only through a package (and only a problem on Windows…).
It’s helpful to know of. Possibly in that case. I would try to see if it helps with or without that explicit GC.gc(). Or GC.gc(false).
But again, I would try Bumper.jl that solves this if you’re ok with rather trivial (local) code changes.