How to wrap a vector so that it does simd?

Hey,

I’ve been recently hit with this issue: I have a very thin wrapper around Vector{T}, let’s say it’s

struct MyVector{T} <: AbstractVector{T}
    data::Vector{T}
end

# Array Interface
Base.size(v::MyVector) = size(v.data)
Base.IndexStyle(::Type{<:MyVector}) = IndexLinear()
Base.@propagate_inbounds Base.getindex(w::MyVector, n) = w.data[n]

It’s very common in my code to do a tight loops where I loop over mv = MyVector(...). recently I found that looping over mv.data is much faster, and by much I mean 5×faster:

function testf(w::AbstractVector{<:Unsigned})
    k = zero(UInt)
    for i in w
        isodd(i) && continue
        k += i^2
    end
    return k
end

v = MyVector{UInt8}(rand(1:100, 1000));
using BenchmarkTools
@btime testf($v) # 768.257 ns (0 allocations: 0 bytes)
@btime testf($(v.data)) # 165.248 ns (0 allocations: 0 bytes)

Of course by inspecting @code_llvm / @code_native it’s clear that this is due to the fact that the latter call vectorizes. So here’s my question:

Is there an easy way to nudge llvm to emit vector instructions for loops over MyVector?

E.g. when I redefine

function Base.iterate(v::MyVector, s=0)
    # (ab)using that eachindex(v) = Base.OneTo(1000) 
    s == length(v) && return nothing
    return @inbounds v[s], s+1
end

I can recover

julia> @btime testf($v); # was: 768.257 ns (0 allocations: 0 bytes)
  161.286 ns (0 allocations: 0 bytes)

julia> @btime testf($(v.data)); # was: 165.248 ns (0 allocations: 0 bytes)
  162.222 ns (0 allocations: 0 bytes)

Is this the correct way to do so? My feel is that there should be a more generic way (but the default provided by Base inhibits vectorization)…

1 Like