I hit what I think is a surprising performance-snag for custom array-wrapper types that subtype AbstractArray.
The issue is best illustrated by example: suppose I define a thin wrapper over a vector, like so:
import Base: size, getindex, IndexStyle
# define a wrapper type over a vector
struct V{T} <: AbstractVector{T}
    x::Vector{T}
end
size(v::V) = size(v.x)
Base.@propagate_inbounds getindex(v::V, i::Int) = v.x[i]
IndexStyle(::Type{<:V{T}}) where T = IndexStyle(Vector{T}) # ... IndexLinear()
Then I would had hoped that this wrapper would perform pretty much as well as the underlying vector when I iterate over it, i.e. I had hoped the performance of each of these functions would be identical:
# iterate over this wrapper directly
function f(v)
    s = zero(eltype(v))
    for vᵢ in v   # <--
        s += vᵢ
    end
    return s
end
# same thing, but iterate over the underlying vector instead
function g(v)
    s = zero(eltype(v))
    for xᵢ in v.x # <--
        s += xᵢ
    end
    return s
end
This is true sometimes, e.g. for Float64 element types
using BenchmarkTools
v_float = V(rand(100000))
@btime f($v_float) # 117.999 μs
@btime g($v_float) # 117.999 μs --- everything performs the same; super!
… but, surprisingly, not for Int element types:
v_int   = V(rand(1:10, 100000))
@btime f($v_int) # slow:    40.499 μs
@btime g($v_int) # fast:    14.399 μs
… what is going on here?