Fast way to access bits of Float64?


#1

I need a fast way to repeatedly access bits in the representation of a floating point value in Julia

julia> rawbit(x::Float64, bit::I) where {I <: Integer} = reinterpret(UInt64, x) & (1 << (bit-1))

but the native code generated for this is quite lengthy. I toyed around with pre-creating the masks in an array but then it is hard to guarantee that the bit argument is between 1 and 64, and thus hard to optimize heavily by the compiler. Any ideas on how to make this very performant?

For context: This is for fast, dynamic, lookup-trees for vectors of Floats which are structured based on bit values (www.phtree.org)


#2

The checks we do to make bit shift operations have predictable, well-defined behavior unfortunately cause quite a bit of unwanted code bloat here. This leads to a paradoxical situation where doing what seems to be more work at the high level ends up generating less machine code. Here’s what I came up with:

rawbit(x::Float64, bit::UInt64) = reinterpret(UInt64, x) & (1 << (63 & (bit-1)))
rawbit(x::Float64, bit::Integer) = rawbit(x, bit % UInt64)

julia> @code_native rawbit(1.5, 3)
	vmovq	%xmm0, %rcx
	addl	$63, %edi
	movl	$1, %eax
	shlxq	%rdi, %rax, %rax
	andq	%rcx, %rax
	retq

I’ve omitted all the verbose comments that @code_native emits these days. The key tricks here are:

  1. bits % UInt64: modding the bit index into the UInt64 type which is strictly non-negative allows the code to skip checking for negative shift values.

  2. 63 & (bit-1): anding the bit-1 shift value with 63 guarantees that the bit shift value cannot be too large which allows the code to skip checking for and handling overly large shift values.

The flip side is that the function has the explicit behavior that it mods all integer values it’s given into a shift value between 0 and 63. As long as you’re ok with that, this definition will work.


#3

Oh, also, note that if you have code that uses constant values for bit a lot of that code bloat goes away. For my definition for example:

bit12(x::Float64) = rawbit(x, 12)

julia> @code_native bit12(1.5)
	vmovq	%xmm0, %rax
	andl	$2048, %eax             ## imm = 0x800
	retq

That’s the minimal code you can hope for since you have to move float values to integer registers before you can do integer operations on them.


#4

with this macro I copied from here rewritten so it works in a loop/iteration 1:64

macro make_rawbit(idx)
    fn = Symbol(string("rawbit",idx))
    quote
        function $(esc(fn))(x::Float64)
            rawbit(x, $idx)
        end
    end
 end

or just written 64 times

@make_rawbit(i) where i=1:64
rawbits = (rawbit1, rawbit2, .., rawbit64)

rawbits[kthbit](targetfloat64)

rawbits[kthbit](targetfloat64) compiles as rawbitk(targetfloat64)


#5

I don’t think this is any more efficient than calling rawbit(x, bit) with a constant bit value (at least not on master, and I suspect on older versions as well).


#6

It is not faster than that. It is exactly as fast as that (the native code appears identical).
I mentioned the approach to allow someone to get the performance you isolated without needing to know which bit they may need during the time of problem solution.

:goldstar: a celebration of your code


#7

Thanks to both of you, this is very helpful.

This seems a case where the switch statement might be useful: https://github.com/JuliaLang/julia/issues/18285
but I can also work with the code you’ve given me to pre-calc which rawbit function to use at which level (in my tree structure) and thus get good performance anyway.


#8

Oh, for others that might have similar situation I should point out that Stefan’s solution require 0.7, on 0.6 the generated code is not as optimal.