Is there an easy way to use the carry-less multiplication instruction on platforms where it is available? I’m implementing this operation in pure Julia at the moment:
but this might benefit greatly from this instruction on recent intel platforms. I’m having a hard time finding a good llvm reference but this Rust pull request gives a starting point:
Thanks for the pointer! I’m not 100% sure how to use that function; my naive attempt gives this result:
julia> llvmcall("llvm.x86.pclmulqdq", Int128, (Int64, Int64), 3, 3)
ERROR: this intrinsic must be compiled to be called
[1] top-level scope at REPL[4]:1
I don’t know if “must be compiled” means something about needing to re-compile Julia itself.
You have to put llvmcall
in a function.
Thanks! Getting much closer:
julia> f(x, y) = llvmcall(("""
declare i128 @llvm.x86.pclmulqdq(i64, i64)""",
"""%3 = call i128 @llvm.x86.pclmulqdq(i64 %1, i64 %0)
ret i128 %3
"""), Int128, Tuple{Int64, Int64}, x, y)
f (generic function with 1 method)
julia> f(3,3)
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_M_construct null not valid
my guess is now that it’s related to the precise signature for this function – trying to find reference documentation for that.
If you use an asserts build you get a more informative error:
julia> f(x, y) = Base.llvmcall(("""
declare i128 @llvm.x86.pclmulqdq(i64, i64)""",
"""%3 = call i128 @llvm.x86.pclmulqdq(i64 %1, i64 %0)
ret i128 %3
"""), Int128, Tuple{Int64, Int64}, x, y)
f (generic function with 1 method)
julia> f(3,3)
julia: /home/tbesard/Julia/julia-dev/deps/srccache/llvm-6.0.1/include/llvm/IR/Instructions.h:1572: llvm::Value *llvm::CallInst::getArgOperand(unsigned int) const: Assertion `i < getNumArgOperands() && "Out of bounds!"' failed.
A quick grep of the LLVM tests that use this intrinsic show that it needs three arguments (the first two of which are vectors).
1 Like
I have some code that calls this; check it out here: That definitions takes SIMD.Vec
input arguments, but they could just be plain Base.VecElement
s as well (and you’d remove the a.elts
1 Like
Thanks both! That’s super helpful.
Follow up question – any best practice for feature sniffing?
I don’t have much experience with this, so I was wondering: why is the last argument an Int32
? Looking at PCLMULQDQ — Carry-Less Multiplication Quadword, I would have expected a 8 bytes, and that does seem to work as well:
const m128 = NTuple{2,VecElement{Int64}}
function carrylessmul(a::m128, b::m128)
ccall("llvm.x86.pclmulqdq", llvmcall, m128, (m128, m128, UInt8), a, b, 0)
julia> @code_native carrylessmul(m128((1, 2)), m128((3, 4)))
.section __TEXT,__text,regular,pure_instructions
; ┌ @ REPL[8]:2 within `carrylessmul'
vpclmulqdq $0, %xmm1, %xmm0, %xmm0
nopw (%eax,%eax)
; └
Thanks @tkoolen ! I used your snippet in this commit – I hope that’s okay.
1 Like
Not sure about best practice, but CpuId
package might be useful.