I thought I’d share something that I’ve run into before, and that I’ve seen a couple of times now on this forum (most recently in this topic).
If you want to shift the integer n
to the left or right by k
bits, the natural way to do it is simply n << k
or n >> k
. However, this is not the most efficient way, since Julia’s shift operator differs from the native shift operator. On 64-bit CPUs, the shift count is masked with 63, so trying to shift to the left with 65 bits results in shifting to the left with 1 bit. In Julia however, the actual count is used, so shifting to the left with 65 bits will zero the result. This means that Julia’s shift operator doesn’t translate very well to native code:
julia> code_native((n,k) -> n << k, (Int,Int); syntax=:intel)
xor eax, eax
cmp rsi, 63
shlx rcx, rdi, rsi
cmovbe rax, rcx
mov rcx, rsi
neg rcx
cmp rcx, 63
jb L29
mov cl, 63
L29:
test rsi, rsi
sarx rcx, rdi, rcx
cmovs rax, rcx
ret
Luckily, there’s an easy way to improve this; masking the count to 6 bits (n << (k&63)
) generates efficient native code:
julia> code_native((n,k) -> n << (k&63), (Int,Int); syntax=:intel)
shlx rax, rdi, rsi
ret
A (very artificial) benchmark:
julia> A = rand(0:63, 100_000);
julia> @btime (s = 0; @inbounds @simd for k = 1:length($A); s += k << $A[k]; end; s)
42.788 μs (0 allocations: 0 bytes)
-2271119849451809947
julia> @btime (s = 0; @inbounds @simd for k = 1:length($A); s += k << ($A[k] & 63); end; s)
15.195 μs (0 allocations: 0 bytes)
-2271119849451809947
This is unlikely to lead to any major performance improvements in your code (probably not even noticeable), but for anyone micro-optimizing and studying native code, this might be of interest.