I saw on stackoverflow, there are “Parallel bit deposit and extract” instructions, which are part of BMI2 (Bit Manipulation Instruction Set 2).
Are these instructions (PDEP
and PEXT
) available to Julia / LLVM?
I saw on stackoverflow, there are “Parallel bit deposit and extract” instructions, which are part of BMI2 (Bit Manipulation Instruction Set 2).
Are these instructions (PDEP
and PEXT
) available to Julia / LLVM?
There is an old issue Support AVX2 and BMI2 instructions, which is now closed. Does that mean that BMI2 instructions are now supported?
@StefanKarpinski, I saw this old post from google groups, LLVM: trying to use x86 pext intrinsic
Did you manage to get it working?
LLVM has had support for it for a long time and now that https://github.com/JuliaLang/julia/pull/21849 is merged the nightly that includes it should be able to use those for JIT. It’s then a matter of what code pattern LLVM recognizes.
OK thanks for that.
I think I found how to call PDEP
and PEXT
explicitly:
pdep(x::UInt32, y::UInt32) = ccall("llvm.x86.bmi.pdep.32", llvmcall, UInt32, (UInt32, UInt32), x, y)
pdep(x::UInt64, y::UInt64) = ccall("llvm.x86.bmi.pdep.64", llvmcall, UInt64, (UInt64, UInt64), x, y)
pext(x::UInt32, y::UInt32) = ccall("llvm.x86.bmi.pext.32", llvmcall, UInt32, (UInt32, UInt32), x, y)
pext(x::UInt64, y::UInt64) = ccall("llvm.x86.bmi.pext.64", llvmcall, UInt64, (UInt64, UInt64), x, y)
Firstly, is this the right way?
I modelled this on pow_fast
found in fastmath.jl
:
pow_fast(x::Float32, y::Integer) = ccall("llvm.powi.f32", llvmcall, Float32, (Float32, Int32), x, y)
pow_fast(x::Float64, y::Integer) = ccall("llvm.powi.f64", llvmcall, Float64, (Float64, Int32), x, y)
Secondly, how can determine if a Julia build / hardware supports this instruction?
Presumably attempting to execute these instructions on a non-supporting system will crash.
So I’d like to define some functions using these instructions only if supported, and use an alternative implementation otherwise.
if `PDEP` instruction supported
pdep(x::UInt32, y::UInt32) = ccall("llvm.x86.bmi.pdep.32", llvmcall, UInt32, (UInt32, UInt32), x, y)
pdep(x::UInt64, y::UInt64) = ccall("llvm.x86.bmi.pdep.64", llvmcall, UInt64, (UInt64, UInt64), x, y)
f(x,y) = pdep(x,y) ...
else
f(x,y) = # some other implementation
end
A rought search through LLVM code (searching for pdep or PDEP) suggests that LLVM doesn’t have any code that do it automatically so this is likely the right/only way right now.
That’ll be the next step for my PR.
I did, but only on a machine that supported that instruction set, of course.
Thanks Yichao.
In the meantime I can use an environment variable (say "JULIA_USEBMI2"
) to opt-in to using these instructions.
Any update on this?
Is there a way to programmatically determine if an intrinsic instruction (e.g. PDEP
and PEXT
) is supported?
I’ve used this:
function has_bmi2()
CPUInfo = zeros(Int32, 4)
ccall(:jl_cpuidex, Cvoid, (Ptr{Cint}, Cint, Cint), CPUInfo, 7, 0)
CPUInfo[2] & 0x100 != 0
end
Thanks!
This is exactly what I was looking for.
2 posts were split to a new topic: What’s the fastest CPU execution model?