following the Parallel Computing course in JuliaAcademy I wanted to try the hardware effects of branch prediction (https://juliaacademy.com/courses/parallel-computing/lectures/10036462, time about 33:00).
The idea is that in the second case the data is sorted, and then the hardware should be able to easily predict which way the branch instruction will go (for the first half of the array being true and for the second half being false), thus being able to show a better performance in the second case.
In the JuliaAcademy recorded lesson, since he is using a cloud service where this sort of prediction is disabled, the performance is basically the same in both versions, but in my computer the second case is actually worse (as you can see below). Any idea why this could be? (I’m running this on a 6-core Intel W3690 chip).
using BenchmarkTools data = rand(5000) function findclosest2(data, point) bestval = first(data) bestdist = abs(bestval - point) for elt in data dist = abs(elt - point) if dist < bestdist bestval = elt bestdist = dist end end return bestval end @benchmark findclosest2($data, $0.5) BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 6.812 μs (0.00% GC) median time: 6.814 μs (0.00% GC) mean time: 6.875 μs (0.00% GC) maximum time: 33.603 μs (0.00% GC) -------------- samples: 10000 evals/sample: 5 sorteddata = sort(data) @benchmark findclosest2($sorteddata, $0.5) BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 8.060 μs (0.00% GC) median time: 8.062 μs (0.00% GC) mean time: 8.154 μs (0.00% GC) maximum time: 44.750 μs (0.00% GC) -------------- samples: 10000 evals/sample: 3