Faster array indexing with a logical array

There is CpuId.jl.

julia> using CpuId

julia> cpufeaturetable()
  Cpu Feature Description
  ––––––––––– ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
  ACPI        Thermal monitor and software controlled clock facilities (MSR)
  ADX         Intel ADX (Multi-Precision Add-Carry Instruction Extensions)
  AES         AES encryption instruction set
  AHF64       LAHF and SAHF in PM64
  APIC        APIC on-chip (Advanced Programmable Interrupt Controller)
  AVX         256bit Advanced Vector Extensions, AVX
  AVX2        SIMD 256bit Advanced Vector Extensions 2
  AVX512BW    AVX-512 Byte and Word Instructions
  AVX512CD    AVX-512 Conflict Detection Instructions
  AVX512DQ    AVX-512 Doubleword and Quadword Instructions
  AVX512F     AVX-512 Foundation
  AVX512VL    AVX-512 Vector Length Extensions
  BMI1        Bit Manipulation Instruction Set 1

Or

julia> using LLVM

julia> s = unsafe_string(LLVM.API.LLVMGetHostCPUFeatures())
"+sse2,+cx16,+sahf,-tbm,-avx512ifma,-sha,-gfni,-fma4,-vpclmulqdq,+prfchw,+bmi2,-cldemote,+fsgsbase,-ptwrite,+xsavec,+popcnt,+mpx,+aes,-avx512bitalg,-movdiri,+xsaves,-avx512er,+avx512vnni,-avx512vpopcntdq,-pconfig,+clwb,+avx512f,-clzero,-pku,+mmx,-lwp,-rdpid,-xop,+rdseed,-waitpkg,-movdir64b,-sse4a,+avx512bw,+clflushopt,+xsave,-avx512vbmi2,+64bit,+avx512vl,+invpcid,+avx512cd,+avx,-vaes,+cx8,+fma,-rtm,+bmi,-enqcmd,+rdrnd,-mwaitx,+sse4.1,+sse4.2,+avx2,+fxsr,-wbnoinvd,+sse,+lzcnt,+pclmul,-prefetchwt1,+f16c,+ssse3,-sgx,-shstk,+cmov,-avx512vbmi,-avx512bf16,+movbe,+xsaveopt,+avx512dq,+adx,-avx512pf,+sse3"

julia> filter(f -> occursin("avx512", f), split(s, ','))
14-element Array{SubString{String},1}:
 "-avx512ifma"
 "-avx512bitalg"
 "-avx512er"
 "+avx512vnni"
 "-avx512vpopcntdq"
 "+avx512f"
 "+avx512bw"
 "-avx512vbmi2"
 "+avx512vl"
 "+avx512cd"
 "-avx512vbmi"
 "-avx512bf16"
 "+avx512dq"
 "-avx512pf"

julia> Libc.free(s); # don't leak memory

A + means that it has the feature, while a - indicates it doesn’t.
So, for example, this CPU has AVX512F, but not AVX512ER (accurate reciprocals or exponentiation) or AVX512BF16 (for BF16 support).
This is basically what VectorizationBase does to define variables for each of these as true/false.

julia> using VectorizationBase

julia> VectorizationBase.AVX512IFMA # integer fused multiply add
false

julia> VectorizationBase.AVX512VNNI # vector neural net instructions
true

Note that LLVM is more complete. CpuId.jl seems to be missing VNNI, for example.

3 Likes