Once again, awesome input. I have to re-study the BenchmarkTools docs.
You are so right. The order of fastest outcomes changes a bit. No allocations at all for each except the original sum of element-wise multiplication. Execution times less than half of the erroneously constructed benchmarks.
Comprehension with zip wins, then mapreduce, then dot. The absolute differences are so small that among the top 3, even in a hot loop, choice might be style. But the comprehension is brief and clear.
The array indices in comprehension is more costly and sum of element-wise multiplication is looking even worse than before.
OK. This exercise was very instructive and we can call it done.
@benchmark sum(l * e for (l, e) in zip(local_patch, err)) (setup = (local_patch=fill(0.5,3,3); err=(fill(0.4,3,3))))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 3.541 ns … 13.708 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.625 ns ┊ GC (median): 0.00%
Time (mean ± σ): 3.740 ns ± 0.261 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ █ █ ▅ ▂ ▃ ▇ ▅ ▂ ▂
█▁█▁▁█▁█▁▁█▁▇▁▁▆▁▆▁▁▆▁█▁▁█▁█▁▁█▁█▁▁▆▁▅▁▁▃▁▁▁▁▄▁▁▁▁▃▁▄▁▁▅▁▃ █
3.54 ns Histogram: log(frequency) by time 4.5 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark mapreduce(splat(*), +, zip(err, local_patch)) (setup = (local_patch=fill(0.5,3,3); err=(fill(0.4,3,3))))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 3.500 ns … 18.583 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.792 ns ┊ GC (median): 0.00%
Time (mean ± σ): 3.749 ns ± 0.268 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃ █ ▇ ▃ ▁ █ █ ▆ ▁▂ ▂
▄▁█▁▁█▁▁█▁▁█▁▁▇▁▁█▁▁█▁▁█▁▁█▁██▁▇▁▁▆▁▁▅▁▁▅▁▁▃▁▁▃▁▁▄▁▁▄▁▁▄▁▃ █
3.5 ns Histogram: log(frequency) by time 4.33 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
using LinearAlgebra
julia> @benchmark dot(local_patch, err) (setup = (local_patch=fill(0.5,3,3); err=(fill(0.4,3,3))))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 5.541 ns … 20.833 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.666 ns ┊ GC (median): 0.00%
Time (mean ± σ): 5.838 ns ± 0.418 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃██ █▄ ▁ ▂ ▇▇ ▃▂ ▂
███▁██▁██▁█▇▁▇▆▁▇▇▁▇█▁██▁██▁▇▇▁▅▅▁▄▅▁▃▁▁▅▅▁▁▁▁▅▄▁▇▅▁▆▆▁▅▃▄ █
5.54 ns Histogram: log(frequency) by time 7.17 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark sum(local_patch[i] * err[i] for i in eachindex(local_patch, err)) (setup = (local_patch=fill(0.5,3,3); err=(fill(0.4,3,3))))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 4.625 ns … 21.292 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.166 ns ┊ GC (median): 0.00%
Time (mean ± σ): 4.988 ns ± 0.362 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▄ █ ▇ ▅ ▂ ▁ ▂ ▇ █ ▄ ▂ ▂
▆▁▁█▁▁█▁▁█▁▁█▁▁▁█▁▁█▁▁▇▁▁▇▁▁▇▁▁▁▇▁▁▇▁▁█▁▁█▁▁▁█▁▁█▁▁█▁▁▇▁▁▅ █
4.62 ns Histogram: log(frequency) by time 5.38 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark sum(local_patch .* err) (setup = (local_patch=fill(0.5,3,3); err=(fill(0.4,3,3))))
BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample.
Range (min … max): 14.988 ns … 719.857 ns ┊ GC (min … max): 0.00% … 95.53%
Time (median): 16.492 ns ┊ GC (median): 0.00%
Time (mean ± σ): 19.495 ns ± 32.341 ns ┊ GC (mean ± σ): 13.73% ± 8.06%
▆▅▂ █▁
▂▂▃███▇▅▄▄▃▄███▆▃▃▃▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▂▂▂ ▃
15 ns Histogram: frequency by time 25.2 ns <
Memory estimate: 144 bytes, allocs estimate: 2.