Faster array indexing with a logical array

Elrod · September 28, 2020, 7:30am

julia> using CpuId

julia> cpufeaturetable()
  Cpu Feature Description
  ––––––––––– ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
  ACPI        Thermal monitor and software controlled clock facilities (MSR)
  ADX         Intel ADX (Multi-Precision Add-Carry Instruction Extensions)
  AES         AES encryption instruction set
  AHF64       LAHF and SAHF in PM64
  APIC        APIC on-chip (Advanced Programmable Interrupt Controller)
  AVX         256bit Advanced Vector Extensions, AVX
  AVX2        SIMD 256bit Advanced Vector Extensions 2
  AVX512BW    AVX-512 Byte and Word Instructions
  AVX512CD    AVX-512 Conflict Detection Instructions
  AVX512DQ    AVX-512 Doubleword and Quadword Instructions
  AVX512F     AVX-512 Foundation
  AVX512VL    AVX-512 Vector Length Extensions
  BMI1        Bit Manipulation Instruction Set 1

Or

julia> using LLVM

julia> s = unsafe_string(LLVM.API.LLVMGetHostCPUFeatures())
"+sse2,+cx16,+sahf,-tbm,-avx512ifma,-sha,-gfni,-fma4,-vpclmulqdq,+prfchw,+bmi2,-cldemote,+fsgsbase,-ptwrite,+xsavec,+popcnt,+mpx,+aes,-avx512bitalg,-movdiri,+xsaves,-avx512er,+avx512vnni,-avx512vpopcntdq,-pconfig,+clwb,+avx512f,-clzero,-pku,+mmx,-lwp,-rdpid,-xop,+rdseed,-waitpkg,-movdir64b,-sse4a,+avx512bw,+clflushopt,+xsave,-avx512vbmi2,+64bit,+avx512vl,+invpcid,+avx512cd,+avx,-vaes,+cx8,+fma,-rtm,+bmi,-enqcmd,+rdrnd,-mwaitx,+sse4.1,+sse4.2,+avx2,+fxsr,-wbnoinvd,+sse,+lzcnt,+pclmul,-prefetchwt1,+f16c,+ssse3,-sgx,-shstk,+cmov,-avx512vbmi,-avx512bf16,+movbe,+xsaveopt,+avx512dq,+adx,-avx512pf,+sse3"

julia> filter(f -> occursin("avx512", f), split(s, ','))
14-element Array{SubString{String},1}:
 "-avx512ifma"
 "-avx512bitalg"
 "-avx512er"
 "+avx512vnni"
 "-avx512vpopcntdq"
 "+avx512f"
 "+avx512bw"
 "-avx512vbmi2"
 "+avx512vl"
 "+avx512cd"
 "-avx512vbmi"
 "-avx512bf16"
 "+avx512dq"
 "-avx512pf"

julia> Libc.free(s); # don't leak memory

A + means that it has the feature, while a - indicates it doesn’t.
So, for example, this CPU has AVX512F, but not AVX512ER (accurate reciprocals or exponentiation) or AVX512BF16 (for BF16 support).
This is basically what VectorizationBase does to define variables for each of these as true/false.

julia> using VectorizationBase

julia> VectorizationBase.AVX512IFMA # integer fused multiply add
false

julia> VectorizationBase.AVX512VNNI # vector neural net instructions
true

Note that LLVM is more complete. CpuId.jl seems to be missing VNNI, for example.

Topic		Replies	Views
Efficient way to get indices of true elements from BitArray New to Julia indexing	5	1767	February 3, 2022
getindex(A::AbstractArray, I...) slows down computations with array of type Array{Float64,2} Performance question	2	374	March 15, 2022
Looping over subset of array indices: Julia vs Python New to Julia indexing	10	2275	June 7, 2022
Iteration with CartesianIndices(array) is slow when the dimension of the array is large Performance indexing	2	812	August 25, 2019
Is it possible to index an array with a tuple? General Usage indexing , array , tuple	10	2822	February 6, 2022

Faster array indexing with a logical array

Related topics