I need a fast way to repeatedly access bits in the representation of a floating point value in Julia
julia> rawbit(x::Float64, bit::I) where {I <: Integer} = reinterpret(UInt64, x) & (1 << (bit-1))
but the native code generated for this is quite lengthy. I toyed around with pre-creating the masks in an array but then it is hard to guarantee that the bit argument is between 1 and 64, and thus hard to optimize heavily by the compiler. Any ideas on how to make this very performant?
For context: This is for fast, dynamic, lookup-trees for vectors of Floats which are structured based on bit values (www.phtree.org)
The checks we do to make bit shift operations have predictable, well-defined behavior unfortunately cause quite a bit of unwanted code bloat here. This leads to a paradoxical situation where doing what seems to be more work at the high level ends up generating less machine code. Here’s what I came up with:
rawbit(x::Float64, bit::UInt64) = reinterpret(UInt64, x) & (1 << (63 & (bit-1)))
rawbit(x::Float64, bit::Integer) = rawbit(x, bit % UInt64)
julia> @code_native rawbit(1.5, 3)
vmovq %xmm0, %rcx
addl $63, %edi
movl $1, %eax
shlxq %rdi, %rax, %rax
andq %rcx, %rax
retq
I’ve omitted all the verbose comments that @code_native
emits these days. The key tricks here are:
-
bits % UInt64
: modding the bit
index into the UInt64
type which is strictly non-negative allows the code to skip checking for negative shift values.
-
63 & (bit-1)
: anding the bit-1
shift value with 63 guarantees that the bit shift value cannot be too large which allows the code to skip checking for and handling overly large shift values.
The flip side is that the function has the explicit behavior that it mods all integer values it’s given into a shift value between 0 and 63. As long as you’re ok with that, this definition will work.
3 Likes
Oh, also, note that if you have code that uses constant values for bit
a lot of that code bloat goes away. For my definition for example:
bit12(x::Float64) = rawbit(x, 12)
julia> @code_native bit12(1.5)
vmovq %xmm0, %rax
andl $2048, %eax ## imm = 0x800
retq
That’s the minimal code you can hope for since you have to move float values to integer registers before you can do integer operations on them.
2 Likes
with this macro I copied from here rewritten so it works in a loop/iteration 1:64
macro make_rawbit(idx)
fn = Symbol(string("rawbit",idx))
quote
function $(esc(fn))(x::Float64)
rawbit(x, $idx)
end
end
end
or just written 64 times
@make_rawbit(i) where i=1:64
rawbits = (rawbit1, rawbit2, .., rawbit64)
rawbits[kthbit](targetfloat64)
rawbits[kthbit](targetfloat64)
compiles as rawbitk(targetfloat64)
I don’t think this is any more efficient than calling rawbit(x, bit)
with a constant bit
value (at least not on master, and I suspect on older versions as well).
It is not faster than that. It is exactly as fast as that (the native code appears identical).
I mentioned the approach to allow someone to get the performance you isolated without needing to know which bit they may need during the time of problem solution.
a celebration of your code
2 Likes
Thanks to both of you, this is very helpful.
This seems a case where the switch statement might be useful: A case/switch statement for Julia · Issue #18285 · JuliaLang/julia · GitHub
but I can also work with the code you’ve given me to pre-calc which rawbit function to use at which level (in my tree structure) and thus get good performance anyway.
Oh, for others that might have similar situation I should point out that Stefan’s solution require 0.7, on 0.6 the generated code is not as optimal.