# How to wrap a vector so that it does simd?

Hey,

I’ve been recently hit with this issue: I have a very thin wrapper around `Vector{T}`, let’s say it’s

``````struct MyVector{T} <: AbstractVector{T}
data::Vector{T}
end

# Array Interface
Base.size(v::MyVector) = size(v.data)
Base.IndexStyle(::Type{<:MyVector}) = IndexLinear()
Base.@propagate_inbounds Base.getindex(w::MyVector, n) = w.data[n]
``````

It’s very common in my code to do a tight loops where I loop over `mv = MyVector(...)`. recently I found that looping over `mv.data` is much faster, and by much I mean 5×faster:

``````function testf(w::AbstractVector{<:Unsigned})
k = zero(UInt)
for i in w
isodd(i) && continue
k += i^2
end
return k
end

v = MyVector{UInt8}(rand(1:100, 1000));
using BenchmarkTools
@btime testf(\$v) # 768.257 ns (0 allocations: 0 bytes)
@btime testf(\$(v.data)) # 165.248 ns (0 allocations: 0 bytes)
``````

Of course by inspecting `@code_llvm` / `@code_native` it’s clear that this is due to the fact that the latter call vectorizes. So here’s my question:

Is there an easy way to nudge llvm to emit vector instructions for loops over `MyVector`?

E.g. when I redefine

``````function Base.iterate(v::MyVector, s=0)
# (ab)using that eachindex(v) = Base.OneTo(1000)
s == length(v) && return nothing
return @inbounds v[s], s+1
end
``````

I can recover

``````julia> @btime testf(\$v); # was: 768.257 ns (0 allocations: 0 bytes)
161.286 ns (0 allocations: 0 bytes)

julia> @btime testf(\$(v.data)); # was: 165.248 ns (0 allocations: 0 bytes)
162.222 ns (0 allocations: 0 bytes)

``````

Is this the correct way to do so? My feel is that there should be a more generic way (but the default provided by `Base` inhibits vectorization)…

1 Like