I agree that GC cannot be disabled as, in particular sometimes GC must be triggered as otherwise you would run out of memory.
I suspect (I will do more tests tomorrow) that for some reason, as @jeff.bezanson commented in When does the garbage collector runs?, there is no 100% sure way to force GC inside a function at a given point (and it gets triggered later when Julia runs out of RAM inside @btime
).
However, this would mean that benchmarking should be performed at top-level scope only. Is this a correct conclusion?