How to choose vec size in SIMD.jl

Looking at the SIMD.jl example vadd!(xs, ys, Vec{8,Float64}) there is a specific size 8 being set for the vec size. How do I know whether to use 4 or 8 or more? I assume it is hardware dependent, but probably there is some way to choose this at runtime?

1 Like

Don’t use SIMD.jl. It is a low level library meant for implementing higher level abstractions. Instead you should use LoopVectorization which will automatically pick the right defaults for you.

As an answer to the question you asked though, yes it’s hardware dependent, and you can choose the right size by seeing what instructions the CPU supports (specifically SSE/AVX/AVX2/AVX512 for x86 cpus)


What I am trying to do does not really fit LoopVectorizations. I need bit-wise logical operations with complicated intermediate bitwise formulas that LoopVectorization’s @turbo and @simd failed to infere can be vectorized.

Hence going back to the SIMD.jl question I have above.

Here is the fairly short kernel that I want to vectorize better

function mul_left!(r::AbstractVector{T}, l::AbstractVector{T}) where T<:Unsigned
    cnt1 = zero(T)
    cnt2 = zero(T)
    len = length(l)>>1
    @inbounds @simd for i in 1:len
        x1, x2, z1, z2 = l[i], r[i], l[i+len], r[i+len]
        r[i] = newx1 = x1 ⊻ x2
        r[i+len] = newz1 = z1 ⊻ z2
        x1z2 = x1 & z2
        anti_comm = (x2 & z1) ⊻ x1z2
        cnt2 ⊻= (cnt1 ⊻ newx1 ⊻ newz1 ⊻ x1z2) & anti_comm
        cnt1 ⊻= anti_comm
    s = count_ones(cnt1)
    s ⊻= count_ones(cnt2) << 1

More general advice on how to improve its vectorization would certainly be appreciated of course!

IMO, this seems like a place where you should file a bug report. Nothing about this code should be that hard for LoopVectorization to vectorize.

1 Like

Isn’t the fact that cnt1 and cnt2 of fixed scalar size, instead of a SIMD vector, making it impossible to perform operations involving them vectorized?

VectorizationBase.jl has more features than SIMD.jl

julia> using VectorizationBase

julia> VectorizationBase.pick_vector_width(Float64)