@benchmark reporting (probably) wrong memory usage (BenchmarkTools.jl)

(I’m asking this here instead of on GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language because it might just be my lack of understanding of what’s happening, and not an actual issue. If this doesn’t fit in here, let me know and I’ll open it as an issue there instead.)

I wrote some code (to solve a Project Euler problem) that seemed fairly straightforward and without any obvious big memory allocations:

function sum_digit_factorial_nums()
    result = 0
    for n in 10:9999999
        if digit_factorial_sum(n) == n
            result += n
        end
    end
    result
end

function digit_factorial_sum(n::Integer)
    nd = digits(n)
    return sum((factorial(d) for d in nd))
end

It runs reasonably fast (3-4 s), but the memory estimate by @benchmark seems way off:

BenchmarkTools.Trial:
  memory estimate:  1.68 GiB
  allocs estimate:  33721367
  --------------
  minimum time:     4.421 s (2.97% GC)
  median time:      4.427 s (2.91% GC)
  mean time:        4.427 s (2.91% GC)
  maximum time:     4.434 s (2.86% GC)
  --------------
  samples:          2
  evals/sample:     1

(this from Julia v0.6.2, the output is similar on v0.7 except slightly lower times and 1.48 GiB allocated instead of 1.68. Inlining the digit_factorial_sum function’s code made no real difference.)

The Julia process’ memory consumption did not raise anywhere near those levels during the execution, and there was no other indication that there was high memory usage going on (and believe me, I’d have felt something on my Jurassic-era laptop!).

How should I interpret the memory estimate output here, and is this business as usual or is it a big enough discrepancy in the estimate to need an issue filed?

This is the total amount allocated, not the peak memory usage.

Thanks. That was one explanation that came to mind, but I thought that information was (kinda) given by the allocs estimate, so this must be a different piece of information.

So if A and B are allocated and then go out of scope (and ready to be gc’d), then C and D are allocated, the memory estimate would be the sum of the memory usages of A, B, C, and D all? Is it fair to say the memory estimate shown is basically the sum of all mallocs done during the execution of the code?

You can look at the source: https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/src/execution.jl#L317-L336

Before execution

__gc_start = Base.gc_num()

then after execution

__gcdiff = Base.GC_Diff(Base.gc_num(), __gc_start)

and the memory estimate is computed as

__memory = Int(fld(__gcdiff.allocd, __evals))

So it is the sum of all allocations during the execution of the codes.

2 Likes

I believe that digits allocates.