I have always been fascinated by the effect of branch prediction demonstrated in this StackOverflow question, and I wanted to include this example in my course to illustrate how hard it is to predict the performance of modern hardware.
"Sum elements if ≥ 128." function sumabove_if(x) s = zero(eltype(x)) for elt in x if elt ≥ 128 s += elt end end s end x_rand = rand(1:256, 32768) # original example on stackoverflow, except using Int x_sorted = sort(x_rand) @btime sumabove_if($x_rand) # 4.139 μs (0 allocations: 0 bytes) @btime sumabove_if($x_sorted) # 3.857 μs (0 allocations: 0 bytes)
As you can see, sorting the array leads to only very little improvement in performance, and definitely not a factor 5x as reported by @Tamas_Papp.
I assume what is happening here is that Julia / LLVM has become clever enough to recognise that the branch is unnecessary and eliminates it for me. So my questions are:
- Is my assumption correct?
- Can I work around the smartness of Julia / LLVM somehow?
julia> versioninfo() Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, skylake) Environment: JULIA_EDITOR = "/usr/share/code/code" JULIA_NUM_THREADS =