Bit Manipulation Instruction Set

question

#1

I saw on stackoverflow, there are “Parallel bit deposit and extract” instructions, which are part of BMI2 (Bit Manipulation Instruction Set 2).

Are these instructions (PDEP and PEXT) available to Julia / LLVM?

https://stackoverflow.com/questions/7669057/find-nth-set-bit-in-an-int)

https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets#Parallel_bit_deposit_and_extract


#2

There is an old issue Support AVX2 and BMI2 instructions, which is now closed. Does that mean that BMI2 instructions are now supported?

@StefanKarpinski, I saw this old post from google groups, LLVM: trying to use x86 pext intrinsic
Did you manage to get it working?


#3

LLVM has had support for it for a long time and now that https://github.com/JuliaLang/julia/pull/21849 is merged the nightly that includes it should be able to use those for JIT. It’s then a matter of what code pattern LLVM recognizes.


#4

OK thanks for that.

I think I found how to call PDEP and PEXT explicitly:

pdep(x::UInt32, y::UInt32) = ccall("llvm.x86.bmi.pdep.32", llvmcall, UInt32, (UInt32, UInt32), x, y)
pdep(x::UInt64, y::UInt64) = ccall("llvm.x86.bmi.pdep.64", llvmcall, UInt64, (UInt64, UInt64), x, y)

pext(x::UInt32, y::UInt32) = ccall("llvm.x86.bmi.pext.32", llvmcall, UInt32, (UInt32, UInt32), x, y)
pext(x::UInt64, y::UInt64) = ccall("llvm.x86.bmi.pext.64", llvmcall, UInt64, (UInt64, UInt64), x, y)

Firstly, is this the right way?

I modelled this on pow_fast found in fastmath.jl:

pow_fast(x::Float32, y::Integer) = ccall("llvm.powi.f32", llvmcall, Float32, (Float32, Int32), x, y)
pow_fast(x::Float64, y::Integer) = ccall("llvm.powi.f64", llvmcall, Float64, (Float64, Int32), x, y)

Secondly, how can determine if a Julia build / hardware supports this instruction?
Presumably attempting to execute these instructions on a non-supporting system will crash.
So I’d like to define some functions using these instructions only if supported, and use an alternative implementation otherwise.

if `PDEP` instruction supported
   pdep(x::UInt32, y::UInt32) = ccall("llvm.x86.bmi.pdep.32", llvmcall, UInt32, (UInt32, UInt32), x, y)
   pdep(x::UInt64, y::UInt64) = ccall("llvm.x86.bmi.pdep.64", llvmcall, UInt64, (UInt64, UInt64), x, y)
   f(x,y) = pdep(x,y) ...
else
   f(x,y) = # some other implementation
end

#5

A rought search through LLVM code (searching for pdep or PDEP) suggests that LLVM doesn’t have any code that do it automatically so this is likely the right/only way right now.

That’ll be the next step for my PR.


#6

I did, but only on a machine that supported that instruction set, of course.


#7

Thanks Yichao.
In the meantime I can use an environment variable (say "JULIA_USEBMI2") to opt-in to using these instructions.