How to choose vec size in SIMD.jl

Krastanov · June 21, 2021, 2:45am

Looking at the SIMD.jl example vadd!(xs, ys, Vec{8,Float64}) there is a specific size 8 being set for the vec size. How do I know whether to use 4 or 8 or more? I assume it is hardware dependent, but probably there is some way to choose this at runtime?

Oscar_Smith · June 21, 2021, 2:51am

Don’t use SIMD.jl. It is a low level library meant for implementing higher level abstractions. Instead you should use LoopVectorization which will automatically pick the right defaults for you.

As an answer to the question you asked though, yes it’s hardware dependent, and you can choose the right size by seeing what instructions the CPU supports (specifically SSE/AVX/AVX2/AVX512 for x86 cpus)

Krastanov · June 21, 2021, 2:58am

What I am trying to do does not really fit LoopVectorizations. I need bit-wise logical operations with complicated intermediate bitwise formulas that LoopVectorization’s @turbo and @simd failed to infere can be vectorized.

Hence going back to the SIMD.jl question I have above.

Here is the fairly short kernel that I want to vectorize better

function mul_left!(r::AbstractVector{T}, l::AbstractVector{T}) where T<:Unsigned
    cnt1 = zero(T)
    cnt2 = zero(T)
    len = length(l)>>1
    @inbounds @simd for i in 1:len
        x1, x2, z1, z2 = l[i], r[i], l[i+len], r[i+len]
        r[i] = newx1 = x1 ⊻ x2
        r[i+len] = newz1 = z1 ⊻ z2
        x1z2 = x1 & z2
        anti_comm = (x2 & z1) ⊻ x1z2
        cnt2 ⊻= (cnt1 ⊻ newx1 ⊻ newz1 ⊻ x1z2) & anti_comm
        cnt1 ⊻= anti_comm
    end
    s = count_ones(cnt1)
    s ⊻= count_ones(cnt2) << 1
    s
end

More general advice on how to improve its vectorization would certainly be appreciated of course!

Oscar_Smith · June 21, 2021, 4:16am

IMO, this seems like a place where you should file a bug report. Nothing about this code should be that hard for LoopVectorization to vectorize.

Krastanov · June 21, 2021, 4:19am

Isn’t the fact that cnt1 and cnt2 of fixed scalar size, instead of a SIMD vector, making it impossible to perform operations involving them vectorized?

YingboMa · June 21, 2021, 4:32am

VectorizationBase.jl has more features than SIMD.jl

julia> using VectorizationBase

julia> VectorizationBase.pick_vector_width(Float64)
static(8)

Topic		Replies	Views
How to use SIMD with a very simple example Performance	6	1814	July 23, 2019
I'm looking at some nice AVX2 instructions, and I'm wondering if there is a path t General Usage	1	784	March 6, 2021
How to do SIMD code with wide-register accumulators (@simd vs LoopVectorization.jl vs SIMD.jl) Performance simd	11	2577	June 22, 2021
Fast short Float32 vector General Usage	9	551	July 7, 2020
LoopVectorization.jl's @avx does not store results Performance question , loopvectorization	9	685	April 2, 2021

How to choose vec size in SIMD.jl

Related topics