Iteration/getindex performance of AbstractArray wrapper-types

I hit what I think is a surprising performance-snag for custom array-wrapper types that subtype AbstractArray.

The issue is best illustrated by example: suppose I define a thin wrapper over a vector, like so:

import Base: size, getindex, IndexStyle

# define a wrapper type over a vector
struct V{T} <: AbstractVector{T}
size(v::V) = size(v.x)
Base.@propagate_inbounds getindex(v::V, i::Int) = v.x[i]
IndexStyle(::Type{<:V{T}}) where T = IndexStyle(Vector{T}) # ... IndexLinear()

Then I would had hoped that this wrapper would perform pretty much as well as the underlying vector when I iterate over it, i.e. I had hoped the performance of each of these functions would be identical:

# iterate over this wrapper directly
function f(v)
    s = zero(eltype(v))
    for vᵢ in v   # <--
        s += vᵢ
    return s
# same thing, but iterate over the underlying vector instead
function g(v)
    s = zero(eltype(v))
    for xᵢ in v.x # <--
        s += xᵢ
    return s

This is true sometimes, e.g. for Float64 element types

using BenchmarkTools
v_float = V(rand(100000))

@btime f($v_float) # 117.999 μs
@btime g($v_float) # 117.999 μs --- everything performs the same; super!

… but, surprisingly, not for Int element types:

v_int   = V(rand(1:10, 100000))

@btime f($v_int) # slow:    40.499 μs
@btime g($v_int) # fast:    14.399 μs

… what is going on here?

@Sukera pointed out on Slack that defining

iterate(v::V, i=1) = iterate(v.x, i)

removes the performance penalty for the Vector{Int} wrapper relative to the underlying array. Needing to do this to get “full” performance is not specifically mentioned in the AbstractArray Interfaces docs; maybe it should be?

1 Like

I’d rather fix the core issue here than document a workaround. It is surely something to do with inlining and/or bounds check since the difference is the SIMD-ifying of the loop or not. Ah, sure enough:

1 Like