Understanding how GC is triggered

bkamins · February 10, 2021, 6:25pm

In the following comment https://github.com/JuliaData/DataFrames.jl/pull/2612#issuecomment-776914558 I have code and results of running it that I do not understand. Some calls to innerjoin function cause 90% GC time, although I know they should not. I try to avoid this using GC.gc() earlier, but it does not help.

@nalimilan - who do you think might help diagnosing it?

bkamins · February 10, 2021, 9:15pm

I have changed @time to @btime and I get super crazy results like:

[ Info: sorted int duplicates many
  64.713 s (167 allocations: 2.45 GiB)
  70.287 s (167 allocations: 2.45 GiB)

where actually the timing in global scope is:

julia> @time innerjoin(df1, df2, on=:id);
  4.639125 seconds (171 allocations: 2.455 GiB, 24.28% gc time)

julia> @time innerjoin(df2, df1, on=:id);
  3.960956 seconds (171 allocations: 2.455 GiB, 15.27% gc time)

So the question is - how one should properly perform a massive number of benchmarks to get correct results in situations where benchmarks allocate a lot (but within the limit of available memory with a safe margin).

jling · February 10, 2021, 9:35pm

slightly tangent: I believe in python, %timeit magic would turn off GC entirely during the run of that line. Which I don’t think is a good idea, but maybe btime can at least try a similar but more mild approach. Although, I think benchmarktools performs gcscrub between every run so I’m puzzled by the 10x difference.

bkamins · February 10, 2021, 11:00pm

I agree that GC cannot be disabled as, in particular sometimes GC must be triggered as otherwise you would run out of memory.

I suspect (I will do more tests tomorrow) that for some reason, as @jeff.bezanson commented in When does the garbage collector runs?, there is no 100% sure way to force GC inside a function at a given point (and it gets triggered later when Julia runs out of RAM inside @btime).

However, this would mean that benchmarking should be performed at top-level scope only. Is this a correct conclusion?

bkamins · February 11, 2021, 5:56pm

I have investigated the issue. The problem is that when I allocate collections holding millions of strings, even after these strings become unreachable:

GC.gc() becomes very slow
Memory does not go down (as if GC thought there are still references to these strings left)

This behavior is “sticky” - i.e. it keeps being this way after running many different operations.

jling · February 11, 2021, 6:19pm

sounds like a memory leak of some sort. What’s the minimally complex “collection” that triggers this?

kristoffer.carlsson · February 11, 2021, 6:25pm

I have some vague recollection that the interpolation of variables into the benchmark macro causes these variables to be unfreeable due to how the way the macro is implemented, but I might be wrong.

bkamins · February 11, 2021, 8:28pm

I am trying to get a minimal reproducer and it is very volatile. Also it matters if I call GC.gc(), GC.gc(true) or do not call it at all. For sure in large test suites as liked above I get this issue.

bkamins · February 13, 2021, 7:22pm

OK - I have pinpointed one of the “guilty” operations (it seems there are more):

Here is a MWE

julia> using CategoricalArrays

julia> z = categorical(string.(1:10^7));

julia> @time z[:];
  5.576754 seconds (10.09 M allocations: 708.608 MiB, 93.54% gc time)

julia> @time z[:];
  2.744316 seconds (10.00 M allocations: 703.911 MiB, 88.53% gc time)

julia> @time z[:];
 11.659653 seconds (10.00 M allocations: 703.911 MiB, 96.96% gc time)

julia> @time z[:];
  1.854164 seconds (10.07 M allocations: 707.786 MiB, 81.41% gc time)

julia> @time z[:];
  1.797977 seconds (10.00 M allocations: 703.911 MiB, 82.61% gc time)

julia> @time z[:];
  5.040256 seconds (10.00 M allocations: 703.911 MiB, 93.30% gc time)

CC @nalimilan

nalimilan · February 13, 2021, 8:50pm

Yes that’s the worst case for CategoricalArray as it’s not supposed to be used with unique values. The cost of GC is due to allocating one CategoricalValue for each unique level. This PR should improve things a lot, but it’s slower in some cases on Julia 1.5 and below (not sure whether it’s really an issue).

Topic		Replies	Views
When does the garbage collector runs? New to Julia	2	7514	September 18, 2018
Strange timing issue with code General Usage	5	718	August 7, 2017
Help with optimizing GC time with large objects in memory Performance	3	668	November 10, 2018
Garbage collection New to Julia question	10	5720	June 15, 2020
Triggerring GC unexpectedly New to Julia	9	411	September 5, 2022

Understanding how GC is triggered

Related topics