Sum of doubles

I guess that a linear sum of array’s elements (maybe not the Julia’s Base.sum based on reduce) time is mostly dominated by the read bandwidth for arrays larger that L3 cache. This should gives you an upper bound for your algorithm performance.

Another way to see it is to use a smaller array (<1.e6 elts) and you should see a large performance (per elt) bump because BenchmarkTools will keep a hot cache.

2 Likes