Hi,
following the Parallel Computing course in JuliaAcademy I wanted to try the hardware effects of branch prediction (Serial Performance | JuliaAcademy, time about 33:00).
The idea is that in the second case the data is sorted, and then the hardware should be able to easily predict which way the branch instruction will go (for the first half of the array being true and for the second half being false), thus being able to show a better performance in the second case.
In the JuliaAcademy recorded lesson, since he is using a cloud service where this sort of prediction is disabled, the performance is basically the same in both versions, but in my computer the second case is actually worse (as you can see below). Any idea why this could be? (I’m running this on a 6-core Intel W3690 chip).
Thanks
using BenchmarkTools
data = rand(5000)
function findclosest2(data, point)
bestval = first(data)
bestdist = abs(bestval - point)
for elt in data
dist = abs(elt - point)
if dist < bestdist
bestval = elt
bestdist = dist
end
end
return bestval
end
@benchmark findclosest2($data, $0.5)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 6.812 μs (0.00% GC)
median time: 6.814 μs (0.00% GC)
mean time: 6.875 μs (0.00% GC)
maximum time: 33.603 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 5
sorteddata = sort(data)
@benchmark findclosest2($sorteddata, $0.5)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 8.060 μs (0.00% GC)
median time: 8.062 μs (0.00% GC)
mean time: 8.154 μs (0.00% GC)
maximum time: 44.750 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 3