The problem might be that the computer (where, I don’t know) is caching values (the result, intermediate values, I don’t know). Then you are not really measuring the time the function really takes to run. That is why restarting the function at every sample with a new set of values is safer.
I am not sure when exactly this kind of problem may arise in benchmarking. But it can arise even if you benchmark independent runs of compiled binaries in a computer, one after the other. A lot of work of putting things into memory, etc, may be saved by the OS.
This is very common, actually:
julia> @btime sin(5.0) # this is wrong
1.537 ns (0 allocations: 0 bytes)
-0.9589242746631385
julia> x = 5.0
julia> @btime sin($x) # this is correct, I think
7.776 ns (0 allocations: 0 bytes)
-0.9589242746631385
julia> @btime sin(x) setup=(x=rand()) evals=1 # this will vary the value of the input
31.000 ns (0 allocations: 0 bytes)
0.6563352918810222
Also, here, there is the fact that computing the sin
of one number has a different cost than taking the sin of another number, thus one needs to know exactly what one want’s to benchmark, considering the input that the function will take.
edit:
Actually:
julia> x = 5.0
5.0
julia> @btime sin($x)
7.777 ns (0 allocations: 0 bytes)
-0.9589242746631385
julia> @btime sin($x) evals=1
38.000 ns (0 allocations: 0 bytes)
-0.9589242746631385
I don’t know. Maybe we just should file an issue. Probably evals=1
should be default.
I share the concerns… I started a new thread here: How to benchmark properly? Should defaults change?.