I’ll first describe how I understand julia’s garbage collection from a high level. Then, my first question is whether I got this right. The next questions are about some manual control to improve performance.
So, how does garbage collection work? There are two kinds of collection runs: First, sweeps. Here, the garbage collector recursively marks all objects that are reachable from local variables on the call-stack and from globals. These are retained, all other are free.
Second, quicksweeps. Here the “generational” part of “generational gc” comes in. Each object can be “old” or “new”, and there is an additional “remset” which is a subset of old objects. We basically run the same as before, with the following change: We don’t mark old objects, nor do we recurse into objects reachable from old objects. We then free all young objects that are not marked. This of course has a problem: We would prematurely free young objects that are kept alive by an old object. This is where the remset comes in: Whenever we attach a young object to an old object (e.g. setindex!
a young mutable object into an old Vector
) then the old object is added to the remset. During each quicksweep, we also mark all young objects reachable from old objects in the remset.
Did I understand this right or is this nonsense?
Now, I am in the situation where I sometimes need to add tiny newly allocated objects to some gigantic old array. Instead of adding the old array to the remset (which makes quicksweeps slow: often one cache-miss for every pointer in the old giant), I’d like to manually age my newly allocated object before adding it. If it has any outgoing pointers to young objects, then I want my forcibly aged object to be added to the remset as well.
Second, I’d like to control the moments when gc runs. This is because I know pretty well the moments when no temporary objects are alive anymore, making for quick quicksweeps with good harvest. How do I best do this? E.g.: disable gc in between; re-enable; trigger tiny alloc to check whether the pool is full; disable again? In the case where typically not even a quicksweep is necessary upon re-enabling? Can I forcibly grow the two pools? (poolsize_A: how often do we (quick)sweep, poolsize_B: How often do we full-sweep)
A toy-problem for my issue:
@noinline function inner_loop(big_old)
r = rand(rand(1:3000))
s = sum(r)
push!(big_old, s)
end
@noinline function outer_loop(big_old, N)
for i=1:N
inner_loop(big_old)
end
nothing
end
With this, I get (first value pays for compilation)
julia> for N in [10, 100_000, 500_000, 1_000_000, 2_000_000]
for T in [Float64, Any]
big_old = Set{T}()
sizehint!(big_old, N)
GC.gc()
GC.gc()
@show N, T
@time outer_loop(big_old, N)
end
end
(N, T) = (10, Float64)
0.108529 seconds (111.84 k allocations: 5.889 MiB)
(N, T) = (10, Any)
0.067782 seconds (104.57 k allocations: 5.493 MiB)
(N, T) = (100000, Float64)
0.640925 seconds (131.60 k allocations: 1.130 GiB, 17.21% gc time)
(N, T) = (100000, Any)
0.568093 seconds (503.64 k allocations: 1.137 GiB, 27.96% gc time)
(N, T) = (500000, Float64)
2.388110 seconds (658.97 k allocations: 5.669 GiB, 9.55% gc time)
(N, T) = (500000, Any)
3.728374 seconds (2.35 M allocations: 5.688 GiB, 45.11% gc time)
(N, T) = (1000000, Float64)
4.199122 seconds (1.32 M allocations: 11.332 GiB, 8.41% gc time)
(N, T) = (1000000, Any)
10.181596 seconds (4.69 M allocations: 11.380 GiB, 59.62% gc time)
(N, T) = (2000000, Float64)
8.398727 seconds (2.63 M allocations: 22.648 GiB, 7.43% gc time)
(N, T) = (2000000, Any)
31.929801 seconds (9.39 M allocations: 22.751 GiB, 73.88% gc time)
Note the superlinear scaling. The memory consumption of both is roughly the same. In this example, I would like to manually age the object I attach to big_old
(a boxed float is silly, but you get the idea), in order to keep quicksweeps quick.