How to wrap a vector so that it does simd?

abulak · January 8, 2022, 2:40pm

Hey,

I’ve been recently hit with this issue: I have a very thin wrapper around Vector{T}, let’s say it’s

struct MyVector{T} <: AbstractVector{T}
    data::Vector{T}
end

# Array Interface
Base.size(v::MyVector) = size(v.data)
Base.IndexStyle(::Type{<:MyVector}) = IndexLinear()
Base.@propagate_inbounds Base.getindex(w::MyVector, n) = w.data[n]

It’s very common in my code to do a tight loops where I loop over mv = MyVector(...). recently I found that looping over mv.data is much faster, and by much I mean 5×faster:

function testf(w::AbstractVector{<:Unsigned})
    k = zero(UInt)
    for i in w
        isodd(i) && continue
        k += i^2
    end
    return k
end

v = MyVector{UInt8}(rand(1:100, 1000));
using BenchmarkTools
@btime testf($v) # 768.257 ns (0 allocations: 0 bytes)
@btime testf($(v.data)) # 165.248 ns (0 allocations: 0 bytes)

Of course by inspecting @code_llvm / @code_native it’s clear that this is due to the fact that the latter call vectorizes. So here’s my question:

Is there an easy way to nudge llvm to emit vector instructions for loops over MyVector?

E.g. when I redefine

function Base.iterate(v::MyVector, s=0)
    # (ab)using that eachindex(v) = Base.OneTo(1000) 
    s == length(v) && return nothing
    return @inbounds v[s], s+1
end

I can recover

julia> @btime testf($v); # was: 768.257 ns (0 allocations: 0 bytes)
  161.286 ns (0 allocations: 0 bytes)

julia> @btime testf($(v.data)); # was: 165.248 ns (0 allocations: 0 bytes)
  162.222 ns (0 allocations: 0 bytes)

Is this the correct way to do so? My feel is that there should be a more generic way (but the default provided by Base inhibits vectorization)…

Topic		Replies	Views
A simple SIMD.jl loop that is slower than a vanilla `@inbounds @simd` Performance simd	8	1889	June 27, 2021
LoopVectorization for sparse matrix operation : @turbo, LoopVectorization.check_args Performance question	1	376	March 3, 2023
Help understanding vectorization (or lack thereof) Performance	15	1221	June 8, 2018
Loop vectorization depends on number of terms in stencil General Usage simd	1	545	July 30, 2019
LoopVectorization.jl's @avx does not store results Performance question , loopvectorization	9	704	April 2, 2021

How to wrap a vector so that it does simd?

Related topics