I thought I’d share something that I’ve run into before, and that I’ve seen a couple of times now on this forum (most recently in this topic).
If you want to shift the integer
n to the left or right by
k bits, the natural way to do it is simply
n << k or
n >> k. However, this is not the most efficient way, since Julia’s shift operator differs from the native shift operator. On 64-bit CPUs, the shift count is masked with 63, so trying to shift to the left with 65 bits results in shifting to the left with 1 bit. In Julia however, the actual count is used, so shifting to the left with 65 bits will zero the result. This means that Julia’s shift operator doesn’t translate very well to native code:
julia> code_native((n,k) -> n << k, (Int,Int); syntax=:intel) xor eax, eax cmp rsi, 63 shlx rcx, rdi, rsi cmovbe rax, rcx mov rcx, rsi neg rcx cmp rcx, 63 jb L29 mov cl, 63 L29: test rsi, rsi sarx rcx, rdi, rcx cmovs rax, rcx ret
Luckily, there’s an easy way to improve this; masking the count to 6 bits (
n << (k&63)) generates efficient native code:
julia> code_native((n,k) -> n << (k&63), (Int,Int); syntax=:intel) shlx rax, rdi, rsi ret
A (very artificial) benchmark:
julia> A = rand(0:63, 100_000); julia> @btime (s = 0; @inbounds @simd for k = 1:length($A); s += k << $A[k]; end; s) 42.788 μs (0 allocations: 0 bytes) -2271119849451809947 julia> @btime (s = 0; @inbounds @simd for k = 1:length($A); s += k << ($A[k] & 63); end; s) 15.195 μs (0 allocations: 0 bytes) -2271119849451809947
This is unlikely to lead to any major performance improvements in your code (probably not even noticeable), but for anyone micro-optimizing and studying native code, this might be of interest.