Why the performance hit on this trivial vector wrapper broadcast?

I’ve got a trivial vector wrapper type implementing a minimal array interface:

using Lazy

struct VecWrapper{T, V<:AbstractVector{T}} <: AbstractVector{T}
    data :: V
end
@forward VecWrapper.data (Base.getindex, Base.setindex!, Base.size)

however I see a ~10% performance hit in broadcasting:

using BenchmarkTools

v = rand(512^2)
vw = VecWrapper(v)

@benchmark $v  .+ $v  # 268.017 μs
@benchmark $vw .+ $vw # 297.195 μs

Any ideas where this is coming from, or how I can get rid of it? I’m on Julia 1.1. Thanks.

Base.@propagate_inbounds Base.getindex(vw::VecWrapper, i) = vw.data[i]

With that modification,

julia> vpv = v .+ v;

julia> @benchmark $vpv .= $v  .+ $v  # 268.017 μs
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     154.436 μs (0.00% GC)
  median time:      155.591 μs (0.00% GC)
  mean time:        158.077 μs (0.00% GC)
  maximum time:     319.513 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark $vpv .= $vw  .+ $vw  # 268.017 μs
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     154.587 μs (0.00% GC)
  median time:      155.770 μs (0.00% GC)
  mean time:        157.691 μs (0.00% GC)
  maximum time:     309.811 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

You will probably want to define setindex! and any other function that cares about bounds checks similarly.

8 Likes

Here is the detailed documentation to complement @Elrod’s excellent answer:

https://docs.julialang.org/en/v1.3-dev/devdocs/boundscheck/

2 Likes

Excellent answer, thanks! I should have remember about @propagate_inbounds.