I am having trouble calling some AVX-512 intrinsics from Julia, coming directly from this post.
The example on the blog post compiles and runs fine on the CPU, since it has the avx512_bitalg
CPU flag.
The problematic instruction in question is _mm512_bitshuffle_epi64_mask
. Using godbolt, I extract the corresponding LLVM name from the line
%7 = tail call <64 x i1> @llvm.x86.avx512.vpshufbitqmb.512(<64 x i8> %6, <64 x i8> %4), !dbg !377
but the following fails:
__m512i = NTuple{64, VecElement{Int8}}
x = __m512i(ntuple(_ -> rand(Int8), 64))
p = __m512i(ntuple(_ -> rand(Int8), 64))
ccall("llvm.x86.avx512.vpshufbitqmb.512", llvmcall, Int64, (__m512i, __m512i), x, p)
with ERROR: llvmcall only supports intrinsic calls
.
Notice that the intrinsic returns a <64 x i1>
, and I am hoping there is some casting to Int64
happening implicitly.
Trying to write it down explicitly as
using SIMD
function _test(x, p)
__m512i = SIMD.LVec{64, Int8}
return Base.llvmcall("""
%3 = call <64 x i1> @llvm.x86.avx512.vpshufbitqmb.512(<64 x i8> %0, <64 x i8> %1)
%4 = bitcast <64 x i1> %3 to i64
ret i64 %4
""", Int64, Tuple{__m512i,__m512i}, x, p)
end
also fails with a different error
ERROR: Failed to parse LLVM assembly:
<string>:3:21: error: use of undefined value '@llvm.x86.avx512.vpshufbitqmb.512'
%3 = call <64 x i1> @llvm.x86.avx512.vpshufbitqmb.512(<64 x i8> %0, <64 x i8> %1)
^
It seems that the intrinsic is not recognized by LLVM. Is there a way to check if the intrinsic is available?
I am running Julia 1.9.0
with LLVM: libLLVM-14.0.6 (ORCJIT, icelake-server)
.