Using CLMUL instruction

Is there an easy way to use the carry-less multiplication instruction on platforms where it is available? I’m implementing this operation in pure Julia at the moment:

https://github.com/tkluck/GaloisFields.jl/blob/d9f0d22960db2f96cbaf2286dc41b3bb7546835b/src/BinaryFields.jl#L66

but this might benefit greatly from this instruction on recent intel platforms. I’m having a hard time finding a good llvm reference but this Rust pull request gives a starting point: https://github.com/rust-lang-nursery/stdsimd/issues/318

Thanks!

Can’t you llvmcall llvm.x86.pclmulqdq? That looks like what Rust does: https://github.com/rust-lang-nursery/stdsimd/pull/320/files#diff-ff2d02d9b8929beed7ecddd85afd0b4eR16

Thanks for the pointer! I’m not 100% sure how to use that function; my naive attempt gives this result:

julia> llvmcall("llvm.x86.pclmulqdq", Int128, (Int64, Int64), 3, 3)
ERROR: this intrinsic must be compiled to be called
Stacktrace:
 [1] top-level scope at REPL[4]:1

I don’t know if “must be compiled” means something about needing to re-compile Julia itself.

You have to put llvmcall in a function.

Thanks! Getting much closer:

julia> f(x, y) = llvmcall(("""
       declare i128 @llvm.x86.pclmulqdq(i64, i64)""",
       """%3 = call i128 @llvm.x86.pclmulqdq(i64 %1, i64 %0)
       ret i128 %3
       """), Int128, Tuple{Int64, Int64}, x, y)
f (generic function with 1 method)

julia> f(3,3)
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid

my guess is now that it’s related to the precise signature for this function – trying to find reference documentation for that.

If you use an asserts build you get a more informative error:


julia> f(x, y) = Base.llvmcall(("""
       declare i128 @llvm.x86.pclmulqdq(i64, i64)""",
       """%3 = call i128 @llvm.x86.pclmulqdq(i64 %1, i64 %0)
       ret i128 %3
       """), Int128, Tuple{Int64, Int64}, x, y)
f (generic function with 1 method)

julia> f(3,3)
julia: /home/tbesard/Julia/julia-dev/deps/srccache/llvm-6.0.1/include/llvm/IR/Instructions.h:1572: llvm::Value *llvm::CallInst::getArgOperand(unsigned int) const: Assertion `i < getNumArgOperands() && "Out of bounds!"' failed.

A quick grep of the LLVM tests that use this intrinsic show that it needs three arguments (the first two of which are vectors).

1 Like

I have some code that calls this; check it out here: https://github.com/quinnj/JJ.jl/blob/master/src/JJ.jl#L25. That definitions takes SIMD.Vec input arguments, but they could just be plain Base.VecElements as well (and you’d remove the a.elts)

1 Like

Thanks both! That’s super helpful.

Follow up question – any best practice for feature sniffing?

I don’t have much experience with this, so I was wondering: why is the last argument an Int32? Looking at PCLMULQDQ — Carry-Less Multiplication Quadword, I would have expected a 8 bytes, and that does seem to work as well:

const m128 = NTuple{2,VecElement{Int64}}

function carrylessmul(a::m128, b::m128)
    ccall("llvm.x86.pclmulqdq", llvmcall, m128, (m128, m128, UInt8), a, b, 0)
end
julia> @code_native carrylessmul(m128((1, 2)), m128((3, 4)))
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[8]:2 within `carrylessmul'
	vpclmulqdq	$0, %xmm1, %xmm0, %xmm0
	retl
	nopw	(%eax,%eax)
; └
2 Likes

Thanks @tkoolen ! I used your snippet in this commit – I hope that’s okay.

1 Like

Not sure about best practice, but CpuId package might be useful.

CpuId.cpufeature(:PCLMUL)
2 Likes