This post is an offspring of the following discussion, where I reported inconsistencies in the benchmarked performances of
Base.sum between machines.
To summarize the issue (which is illustrated a bit more clearly below) while benchmarking summation algorithms as compared to
Base.sum, I collected a few results coming from various colleagues’ machines. And noticed large variations in the benchmarked performances of
Base.sum across machines
I suggested that this might have to do with vectorization, and @mbauman provided ways to check whether the SSE/AVX/AVX2/AVX512 capabilities of the CPU explained these differences.
It turns out that instead, these variations had to do with how these benchmarks were run: from a standalone call of the julia compiler, or via
Here is a very simple example of a mostly empty package, in which the
test/runtests.jl file has the following contents:
using BenchmarkTools @btime sum($(rand(1_000)))
Then, we get: when running the test file from the command line:
> julia --project -O3 test/runtests.jl 84.877 ns (0 allocations: 0 bytes)
but when running from
shell> julia --project --quiet -O3 julia> using Pkg; Pkg.test() Testing BenchSum Resolving package versions... 795.624 ns (0 allocations: 0 bytes) Testing BenchSum tests passed
In case anyone is wondering, the situation is almost exactly the same without the
That’s nearly a 10x slow-down! Is it expected? Does anyone have an idea why this happens?