Repeated @benchmark causes RAM creep

I am currently trying to quantify the performance/ scaling of the VML functions as added via VML.jl,
and ran into a curious problem.

Running this script, my RAM steadily increases until julia consumes 12-14Gb at the end.

This is especially curious because medianandtest only return a single number and computesizedep consequently returns only a 15-element array. Ran over the 31 functions contained in base_unary_real, this results in a Dict with ~500 Floats64 (and Dict overhead).
Everything else, including the large arrays that get generated in medianandtest and interpolated into @benchmark should frequently go out of scope and free up.

However, what I found is the following, when printing the free system memory in Mb before and after benchmarking a given function for a number of array sizes:

cbrt
9632.1953125
11747.6328125
sqrt
11747.6328125
8581.375
exp
8581.375
8248.140625
expm1
8248.140625
11711.32421875
log
11711.32421875
8172.6640625
log10
8172.6640625
7615.99609375
log1p
7615.99609375
7231.12890625
abs
7231.12890625
8724.1796875
abs2
8724.1796875
8916.37109375
ceil
8916.37109375
4671.90625
floor
4671.90625
4040.67578125
round
4040.67578125
3286.375
trunc
3286.375
2255.203125
cis
2255.203125
2547.5234375
erf
2547.5234375
1600.7578125
erfc
1600.7578125
1575.23828125
erfinv
1575.23828125
1358.9375
erfcinv
1358.9375
1330.16015625
gamma
1330.16015625
1333.8125

Clearly something gets freed, but the RAM still increases overall. In addition, even though the same sizes of arrays are created for each function, and each function is virtually the same apart from the symbol for the MKL shared library function being called, the change in RAM is very different.

Lastly, the memory does not get freed even after the top function saves the results and returns, requiring a REPL restart.

“Manually” freeing the results of @benchmark as in lines 25/26 helped a little bit, as previously julia got terminated by the OS (Manjaro Linux) for filling every bit of RAM (and swap).

baseBench = nothing
vmlBench = nothing

But even with this we get the results shown above.

If anyone knows why this happens or what to do about it I would be very grateful.

Every time @benchmark (in particular, @benchmarkable) is executed, new functions are defined:

julia> using BenchmarkTools

julia> x = rand(5); f(x) = sum(x)
f (generic function with 1 method)

julia> b = BenchmarkTools.@benchmarkable f($x)
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> names(Main; all=true)
16-element Array{Symbol,1}:
 Symbol("###core#404")
 Symbol("###sample#405")
 Symbol("##_run#3")
 Symbol("##core#404")
 Symbol("##sample#405")
 Symbol("#1#2")
 Symbol("#_run#3")
 Symbol("#f")
 :Base
 :Core
 :InteractiveUtils
 :Main
 :ans
 :b
 :f
 :x

julia> b = BenchmarkTools.@benchmarkable f($x)
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> names(Main; all=true)
22-element Array{Symbol,1}:
 Symbol("###core#404")
 Symbol("###core#409")
 Symbol("###sample#405")
 Symbol("###sample#410")
 Symbol("##_run#3")
 Symbol("##_run#4")
 Symbol("##core#404")
 Symbol("##core#409")
 Symbol("##sample#405")
 Symbol("##sample#410")
 Symbol("#1#2")
 Symbol("#_run#3")
 Symbol("#_run#4")
 Symbol("#f")
 :Base
 :Core
 :InteractiveUtils
 :Main
 :ans
 :b
 :f
 :x:

Also, even if we rebind x to something where you cannot take the sum of, we can still run the benchmark function b:

julia> x = "foo"
"foo"

julia> run(b)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     17.000 ns (0.00% GC)
  median time:      19.000 ns (0.00% GC)
  mean time:        19.180 ns (0.00% GC)
  maximum time:     126.000 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

This means that the original object must be cached somewhere inside one of these functions (that will never go out of scope). So, to me, it seems that @benchmarkable will cache every object interpolated and these will never go out of scope and thus never be garbage collected.

This is not the case when not using interpolation:

julia> x = rand(10);

julia> b = BenchmarkTools.@benchmarkable f(x)
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> x = "foo"
"foo"

julia> run(b)
ERROR: MethodError: no method matching +(::Char, ::Char)
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:529
  +(::Integer, ::AbstractChar) at char.jl:224
  +(::T, ::Integer) where T<:AbstractChar at char.jl:223
3 Likes

Also see https://github.com/JuliaGPU/CuArrays.jl/issues/210 and https://github.com/JuliaCI/BenchmarkTools.jl/issues/127

I saw the issue 127, but was not fully sure they were related.

Also, there is some level of garbage collection going on, as the amount of available RAM occasionally jumps back up, in one case that I saw it suddenly freed 8Gb.
I suppose some swapping could be happening, but my swap partition is about 3Gb I believe.

Also, given that the RAM remains full even after the script terminates it appears all these objects are cached globally somewhere?

Is the remedy then to not interpolate the input into the functions? I have started reading up on scope and benchmarking, but all I got so far is that its complicated.

1 Like

Which OS are you running?
If I assume Linux, then it is often the case that you drop caches before a benchmark run

echo 3 > /proc/sys/vm/drop_caches