Hey,
I’ve been recently hit with this issue: I have a very thin wrapper around Vector{T}
, let’s say it’s
struct MyVector{T} <: AbstractVector{T}
data::Vector{T}
end
# Array Interface
Base.size(v::MyVector) = size(v.data)
Base.IndexStyle(::Type{<:MyVector}) = IndexLinear()
Base.@propagate_inbounds Base.getindex(w::MyVector, n) = w.data[n]
It’s very common in my code to do a tight loops where I loop over mv = MyVector(...)
. recently I found that looping over mv.data
is much faster, and by much I mean 5×faster:
function testf(w::AbstractVector{<:Unsigned})
k = zero(UInt)
for i in w
isodd(i) && continue
k += i^2
end
return k
end
v = MyVector{UInt8}(rand(1:100, 1000));
using BenchmarkTools
@btime testf($v) # 768.257 ns (0 allocations: 0 bytes)
@btime testf($(v.data)) # 165.248 ns (0 allocations: 0 bytes)
Of course by inspecting @code_llvm
/ @code_native
it’s clear that this is due to the fact that the latter call vectorizes. So here’s my question:
Is there an easy way to nudge llvm to emit vector instructions for loops over
MyVector
?
E.g. when I redefine
function Base.iterate(v::MyVector, s=0)
# (ab)using that eachindex(v) = Base.OneTo(1000)
s == length(v) && return nothing
return @inbounds v[s], s+1
end
I can recover
julia> @btime testf($v); # was: 768.257 ns (0 allocations: 0 bytes)
161.286 ns (0 allocations: 0 bytes)
julia> @btime testf($(v.data)); # was: 165.248 ns (0 allocations: 0 bytes)
162.222 ns (0 allocations: 0 bytes)
Is this the correct way to do so? My feel is that there should be a more generic way (but the default provided by Base
inhibits vectorization)…