What's the most performant way to return some bit of an Int in Bool?

Shifting left and right takes two operations and the result needs to be converted. Is there a method with one operation or that can skip the conversion?

1 Like

If the bit position is known beforehand:

 f(x) = isodd(x >>> 61)

, which compiles to a shr and a and instruction.

If the bit position is not know, but it’s known that it’s in 0:N-1, where N is the number of bits in the integer, then

f(x, y) = isodd(x >>> (y & 63))

(where 63 is N-1), which compiles to a bt and a setb instruction

If you don’t know the bit position, and it might be lower than 0 or higher than N-1:

f(x, y) = isodd(x >>> y)

, which compiles to significantly more complicated assembly.

1 Like

Though, whether that ends up as a conversion in the generated assembly depends on a lot of factors. It might just end up as bitmask and a nonzero check. Do you have a particular example you’re interested in?

For example, say I have a vector v of length 64. Then I have an Int n which works like a filter: the entries of v map to the bits of n and the entries that map to the zero bits are deleted or ignored. Then a reduce operation is applied to what’s left.

The two implementations I would consider would be things like

checkbit_1(x::Base.BitInteger, b::Integer) = x & (one(x) << b) != zero(x)
checkbit_2(x::Base.BitInteger, b::Integer) = (x >>> b) & one(x) != zero(x)

(you might want to use (b-1) in place of b, depending on how you want to index bits). An inspection of the standalone @code_native suggests checkbit_2 (which should be basically identical to the suggested isodd(x >>> y) above) may be slightly better.

A primitive function like this will vary considerably in performance based on its context, so a straight answer as to “what is best?” is difficult. For example, if b is compile-time known then this compiles down to 2 or 3 instructions while if variable it’s more like 12-15. If it’s unknown but has a known range then it might land anywhere between. What you do with the Bool afterward can also have a big impact. For example, whether you assign the value to a memory location, add it to an accumulator, or use it as a condition to an if statement can dramatically change the compiled code.

All that said, it’s difficult to imagine that any of the options I gave, or anything anyone else suggested, actually give a significant performance difference in a real use case. Compilers are quite good at the basic bit operations involved here and it will probably get pretty close to optimal from most sane starting points. Further, whatever actual work you do based on this outcome could easily dwarf any possible performance difference.

You’ll likely need to assemble a few candidates and then benchmark in your actual (or a truly representative) context to find what is best. Even then, the differences may be negligible or within the noise.

Ok, but that’s still not enough to deduce what ultimately happens once it’s compiled - for one, you didn’t specify the element type of the vector, which is crucial. If it’s a Vector{UInt8}, it’s possible a SIMD reduction happens (including preamble & postamble). Or just a regular loop in the assembly, if the reduction isn’t SIMDable or the element type isn’t SIMDable.

There’s no blanket statement that covers all possibilities without a concrete example. What you are describing sounds a lot like you’re asking about SIMD intrinsics, in which case SIMD.jl will be of interest to you.

Thanks for that helpful information. SIMD seems to be quite useful for my project. However, in terms of the original question, @jakobnissen 's answer is sufficient so I’m giving that the credit. @mikmoore 's answer is also helpful. I should have phrased the question better.