In the following comment https://github.com/JuliaData/DataFrames.jl/pull/2612#issuecomment-776914558 I have code and results of running it that I do not understand. Some calls to innerjoin function cause 90% GC time, although I know they should not. I try to avoid this using GC.gc() earlier, but it does not help.
@nalimilan - who do you think might help diagnosing it?
So the question is - how one should properly perform a massive number of benchmarks to get correct results in situations where benchmarks allocate a lot (but within the limit of available memory with a safe margin).
slightly tangent: I believe in python, %timeit magic would turn off GC entirely during the run of that line. Which I don’t think is a good idea, but maybe btime can at least try a similar but more mild approach. Although, I think benchmarktools performs gcscrub between every run so I’m puzzled by the 10x difference.
I agree that GC cannot be disabled as, in particular sometimes GC must be triggered as otherwise you would run out of memory.
I suspect (I will do more tests tomorrow) that for some reason, as @jeff.bezanson commented in When does the garbage collector runs?, there is no 100% sure way to force GC inside a function at a given point (and it gets triggered later when Julia runs out of RAM inside @btime).
However, this would mean that benchmarking should be performed at top-level scope only. Is this a correct conclusion?
I have investigated the issue. The problem is that when I allocate collections holding millions of strings, even after these strings become unreachable:
GC.gc() becomes very slow
Memory does not go down (as if GC thought there are still references to these strings left)
This behavior is “sticky” - i.e. it keeps being this way after running many different operations.
I have some vague recollection that the interpolation of variables into the benchmark macro causes these variables to be unfreeable due to how the way the macro is implemented, but I might be wrong.
I am trying to get a minimal reproducer and it is very volatile. Also it matters if I call GC.gc(), GC.gc(true) or do not call it at all. For sure in large test suites as liked above I get this issue.
Yes that’s the worst case for CategoricalArray as it’s not supposed to be used with unique values. The cost of GC is due to allocating one CategoricalValue for each unique level. This PR should improve things a lot, but it’s slower in some cases on Julia 1.5 and below (not sure whether it’s really an issue).