Bit Manipulation Instruction Set

I saw on stackoverflow, there are “Parallel bit deposit and extract” instructions, which are part of BMI2 (Bit Manipulation Instruction Set 2).

Are these instructions (PDEP and PEXT) available to Julia / LLVM?

algorithm - Find nth SET bit in an int - Stack Overflow)

x86 Bit manipulation instruction set - Wikipedia

2 Likes

There is an old issue Support AVX2 and BMI2 instructions, which is now closed. Does that mean that BMI2 instructions are now supported?

@StefanKarpinski, I saw this old post from google groups, LLVM: trying to use x86 pext intrinsic
Did you manage to get it working?

LLVM has had support for it for a long time and now that https://github.com/JuliaLang/julia/pull/21849 is merged the nightly that includes it should be able to use those for JIT. It’s then a matter of what code pattern LLVM recognizes.

OK thanks for that.

I think I found how to call PDEP and PEXT explicitly:

pdep(x::UInt32, y::UInt32) = ccall("llvm.x86.bmi.pdep.32", llvmcall, UInt32, (UInt32, UInt32), x, y)
pdep(x::UInt64, y::UInt64) = ccall("llvm.x86.bmi.pdep.64", llvmcall, UInt64, (UInt64, UInt64), x, y)

pext(x::UInt32, y::UInt32) = ccall("llvm.x86.bmi.pext.32", llvmcall, UInt32, (UInt32, UInt32), x, y)
pext(x::UInt64, y::UInt64) = ccall("llvm.x86.bmi.pext.64", llvmcall, UInt64, (UInt64, UInt64), x, y)

Firstly, is this the right way?

I modelled this on pow_fast found in fastmath.jl:

pow_fast(x::Float32, y::Integer) = ccall("llvm.powi.f32", llvmcall, Float32, (Float32, Int32), x, y)
pow_fast(x::Float64, y::Integer) = ccall("llvm.powi.f64", llvmcall, Float64, (Float64, Int32), x, y)

Secondly, how can determine if a Julia build / hardware supports this instruction?
Presumably attempting to execute these instructions on a non-supporting system will crash.
So I’d like to define some functions using these instructions only if supported, and use an alternative implementation otherwise.

if `PDEP` instruction supported
   pdep(x::UInt32, y::UInt32) = ccall("llvm.x86.bmi.pdep.32", llvmcall, UInt32, (UInt32, UInt32), x, y)
   pdep(x::UInt64, y::UInt64) = ccall("llvm.x86.bmi.pdep.64", llvmcall, UInt64, (UInt64, UInt64), x, y)
   f(x,y) = pdep(x,y) ...
else
   f(x,y) = # some other implementation
end
1 Like

A rought search through LLVM code (searching for pdep or PDEP) suggests that LLVM doesn’t have any code that do it automatically so this is likely the right/only way right now.

That’ll be the next step for my PR.

2 Likes

I did, but only on a machine that supported that instruction set, of course.

Thanks Yichao.
In the meantime I can use an environment variable (say "JULIA_USEBMI2") to opt-in to using these instructions.

Any update on this?

Is there a way to programmatically determine if an intrinsic instruction (e.g. PDEP and PEXT) is supported?

I’ve used this:

function has_bmi2()
    CPUInfo = zeros(Int32, 4)
    ccall(:jl_cpuidex, Cvoid, (Ptr{Cint}, Cint, Cint), CPUInfo, 7, 0)
    CPUInfo[2] & 0x100 != 0
end

Cf. this post. There’s also CpuId.jl.

1 Like

Thanks!
This is exactly what I was looking for.

2 posts were split to a new topic: What’s the fastest CPU execution model?