Bit Manipulation Instruction Set

greg_plowman · August 14, 2017, 2:49am

I saw on stackoverflow, there are “Parallel bit deposit and extract” instructions, which are part of BMI2 (Bit Manipulation Instruction Set 2).

Are these instructions (PDEP and PEXT) available to Julia / LLVM?

algorithm - Find nth SET bit in an int - Stack Overflow)

x86 Bit manipulation instruction set - Wikipedia

greg_plowman · October 20, 2017, 1:34am

There is an old issue Support AVX2 and BMI2 instructions, which is now closed. Does that mean that BMI2 instructions are now supported?

@StefanKarpinski, I saw this old post from google groups, LLVM: trying to use x86 pext intrinsic
Did you manage to get it working?

yuyichao · October 20, 2017, 2:16am

LLVM has had support for it for a long time and now that https://github.com/JuliaLang/julia/pull/21849 is merged the nightly that includes it should be able to use those for JIT. It’s then a matter of what code pattern LLVM recognizes.

greg_plowman · October 20, 2017, 10:47am

OK thanks for that.

I think I found how to call PDEP and PEXT explicitly:

pdep(x::UInt32, y::UInt32) = ccall("llvm.x86.bmi.pdep.32", llvmcall, UInt32, (UInt32, UInt32), x, y)
pdep(x::UInt64, y::UInt64) = ccall("llvm.x86.bmi.pdep.64", llvmcall, UInt64, (UInt64, UInt64), x, y)

pext(x::UInt32, y::UInt32) = ccall("llvm.x86.bmi.pext.32", llvmcall, UInt32, (UInt32, UInt32), x, y)
pext(x::UInt64, y::UInt64) = ccall("llvm.x86.bmi.pext.64", llvmcall, UInt64, (UInt64, UInt64), x, y)

Firstly, is this the right way?

I modelled this on pow_fast found in fastmath.jl:

pow_fast(x::Float32, y::Integer) = ccall("llvm.powi.f32", llvmcall, Float32, (Float32, Int32), x, y)
pow_fast(x::Float64, y::Integer) = ccall("llvm.powi.f64", llvmcall, Float64, (Float64, Int32), x, y)

Secondly, how can determine if a Julia build / hardware supports this instruction?
Presumably attempting to execute these instructions on a non-supporting system will crash.
So I’d like to define some functions using these instructions only if supported, and use an alternative implementation otherwise.

if `PDEP` instruction supported
   pdep(x::UInt32, y::UInt32) = ccall("llvm.x86.bmi.pdep.32", llvmcall, UInt32, (UInt32, UInt32), x, y)
   pdep(x::UInt64, y::UInt64) = ccall("llvm.x86.bmi.pdep.64", llvmcall, UInt64, (UInt64, UInt64), x, y)
   f(x,y) = pdep(x,y) ...
else
   f(x,y) = # some other implementation
end

yuyichao · October 20, 2017, 12:05pm

A rought search through LLVM code (searching for pdep or PDEP) suggests that LLVM doesn’t have any code that do it automatically so this is likely the right/only way right now.

That’ll be the next step for my PR.

StefanKarpinski · October 20, 2017, 11:04pm

I did, but only on a machine that supported that instruction set, of course.

greg_plowman · October 23, 2017, 3:56am

Thanks Yichao.
In the meantime I can use an environment variable (say "JULIA_USEBMI2") to opt-in to using these instructions.

greg_plowman · January 12, 2019, 6:32am

Any update on this?

Is there a way to programmatically determine if an intrinsic instruction (e.g. PDEP and PEXT) is supported?

bennedich · January 12, 2019, 8:46am

I’ve used this:

function has_bmi2()
    CPUInfo = zeros(Int32, 4)
    ccall(:jl_cpuidex, Cvoid, (Ptr{Cint}, Cint, Cint), CPUInfo, 7, 0)
    CPUInfo[2] & 0x100 != 0
end

Cf. this post. There’s also CpuId.jl.

greg_plowman · January 12, 2019, 12:05pm

Thanks!
This is exactly what I was looking for.

mbauman · November 6, 2020, 2:37pm

2 posts were split to a new topic: What’s the fastest CPU execution model?

Topic		Replies	Views
Calling AVX-512 intrinsics from Julia General Usage bit-twiddling	6	720	July 4, 2023
Julia equivalent of C compiler intrinsics? General Usage	23	3105	November 8, 2018
Bit manipulations with llvmcall give strange results General Usage llvm	11	318	March 9, 2024
How to compile AVX instructions General Usage simd	1	1324	May 14, 2017
Compilation options for Downfall mitigation Performance question	4	888	October 25, 2023

Bit Manipulation Instruction Set

Related topics