Understanding how GC is triggered

In the following comment https://github.com/JuliaData/DataFrames.jl/pull/2612#issuecomment-776914558 I have code and results of running it that I do not understand. Some calls to innerjoin function cause 90% GC time, although I know they should not. I try to avoid this using GC.gc() earlier, but it does not help.

@nalimilan - who do you think might help diagnosing it?

1 Like

I have changed @time to @btime and I get super crazy results like:

[ Info: sorted int duplicates many
  64.713 s (167 allocations: 2.45 GiB)
  70.287 s (167 allocations: 2.45 GiB)

where actually the timing in global scope is:

julia> @time innerjoin(df1, df2, on=:id);
  4.639125 seconds (171 allocations: 2.455 GiB, 24.28% gc time)

julia> @time innerjoin(df2, df1, on=:id);
  3.960956 seconds (171 allocations: 2.455 GiB, 15.27% gc time)

So the question is - how one should properly perform a massive number of benchmarks to get correct results in situations where benchmarks allocate a lot (but within the limit of available memory with a safe margin).

slightly tangent: I believe in python, %timeit magic would turn off GC entirely during the run of that line. Which I don’t think is a good idea, but maybe btime can at least try a similar but more mild approach. Although, I think benchmarktools performs gcscrub between every run so I’m puzzled by the 10x difference.

I agree that GC cannot be disabled as, in particular sometimes GC must be triggered as otherwise you would run out of memory.

I suspect (I will do more tests tomorrow) that for some reason, as @jeff.bezanson commented in When does the garbage collector runs?, there is no 100% sure way to force GC inside a function at a given point (and it gets triggered later when Julia runs out of RAM inside @btime).

However, this would mean that benchmarking should be performed at top-level scope only. Is this a correct conclusion?

I have investigated the issue. The problem is that when I allocate collections holding millions of strings, even after these strings become unreachable:

  1. GC.gc() becomes very slow
  2. Memory does not go down (as if GC thought there are still references to these strings left)

This behavior is “sticky” - i.e. it keeps being this way after running many different operations.

1 Like

sounds like a memory leak of some sort. What’s the minimally complex “collection” that triggers this?

I have some vague recollection that the interpolation of variables into the benchmark macro causes these variables to be unfreeable due to how the way the macro is implemented, but I might be wrong.

I am trying to get a minimal reproducer and it is very volatile. Also it matters if I call GC.gc(), GC.gc(true) or do not call it at all. For sure in large test suites as liked above I get this issue.

OK - I have pinpointed one of the “guilty” operations (it seems there are more):

Here is a MWE

julia> using CategoricalArrays

julia> z = categorical(string.(1:10^7));

julia> @time z[:];
  5.576754 seconds (10.09 M allocations: 708.608 MiB, 93.54% gc time)

julia> @time z[:];
  2.744316 seconds (10.00 M allocations: 703.911 MiB, 88.53% gc time)

julia> @time z[:];
 11.659653 seconds (10.00 M allocations: 703.911 MiB, 96.96% gc time)

julia> @time z[:];
  1.854164 seconds (10.07 M allocations: 707.786 MiB, 81.41% gc time)

julia> @time z[:];
  1.797977 seconds (10.00 M allocations: 703.911 MiB, 82.61% gc time)

julia> @time z[:];
  5.040256 seconds (10.00 M allocations: 703.911 MiB, 93.30% gc time)

CC @nalimilan

Yes that’s the worst case for CategoricalArray as it’s not supposed to be used with unique values. The cost of GC is due to allocating one CategoricalValue for each unique level. This PR should improve things a lot, but it’s slower in some cases on Julia 1.5 and below (not sure whether it’s really an issue).