Extending broadcast to vector-like struct

Hi,
I’m trying to implement a vector-like struct and I would like for it to behave wrt broadcasting, just as a regular vector.

My struct is something like:

struct NewVector
    values::Vector{Float64}
    text::String
end

So far, I’m trying to make this work:

u = Vector{Float64}(undef, 10)
v1 = NewVector(u, "vec1")
v2 = NewVector(u, "vec2")

v1 .= v2
v1 .+= v2

without much success. I’ve read the documentation on Interfaces in the Julia docs and I’ve implemented

Base.length(u::NewVector) = Base.length(u.values)
Base.size(u::NewVector) = Base.size(u.values)
Base.iterate(u::NewVector) = Base.iterate(u.values)
Base.iterate(u::NewVector, state) = Base.iterate(u.values, state)
Base.ndims(::Type{NewVector}) = 1

based on the error messages I keep getting. However, the next error message demands an implementation of copyto!(::NewVector, ::Base.Broadcast.Broadcasted{…}) which I don’t understand how to do.

Can anyone help me?

There are some examples here:

https://docs.julialang.org/en/v1/manual/interfaces/#man-interfaces-broadcasting

In particular the ArrayAndChar example, which seems quite similar to your example.

1 Like

Thank you so much for your quick answer. Indeed, implementing something similar as the example ArrayAndChar makes it work. However, there is a significant performance drop. Is there a way to circumvent this?

using BenchmarkTools

struct NewVector{T} <: AbstractVector{T}
    values::Vector{T}
    text::String
end

Base.length(u::NewVector) = Base.length(u.values)
Base.size(u::NewVector) = Base.size(u.values)
Base.iterate(u::NewVector) = Base.iterate(u.values)
Base.iterate(u::NewVector, state) = Base.iterate(u.values, state)
Base.ndims(::Type{NewVector}) = 1
Base.getindex(u::NewVector, i) = u.values[i]
Base.setindex!(u::NewVector, val, i) = u.values[i] = val

Base.BroadcastStyle(::Type{<:NewVector}) = Broadcast.ArrayStyle{NewVector}()

find_aac(bc::Base.Broadcast.Broadcasted) = find_aac(bc.args)
find_aac(args::Tuple) = find_aac(find_aac(args[1]), Base.tail(args))
find_aac(x) = x
find_aac(::Tuple{}) = nothing
find_aac(a::NewVector, rest) = a
find_aac(::Any, rest) = find_aac(rest)

function Base.similar(bc::Broadcast.Broadcasted{Broadcast.ArrayStyle{NewVector}}, ::Type{T}) where T
    A = find_aac(bc)
    return NewVector(similar(Vector{T}, axes(bc)), A.char)
end

u = Vector{Float64}(undef, 10)
v1 = NewVector(u, "vec1")
v2 = NewVector(u, "vec2")

function test1(a,b)
    a .= b
end

function test2(a,b)
    a.values .= b.values
end

test1(v1,v2);
test2(v1,v2);
@btime test1($v1,$v2); # 37.466 ns
@btime test2($v1,$v2); # 5.625 ns

As far as I can tell, it’s the overhead of loading u.values in getindex and setindex!, and it’ll be outweighed by more operations and more data. test2 does the loads before the broadcast loop so the loads aren’t semantically repeated. I can’t rule out the compiler doing this for you through inlining and hoisting in niche situations, but it evidently doesn’t happen here. You don’t use u.text at any point (I think A.char was intended to be A.text) and if you don’t plan to for this part of the NewVector code, you could instead make a duplicate non-vector type that semantically wraps a vector to be unwrapped for mutation. On the other hand, this is the preferred approach if the extra fields in custom wrapper types would generally be involved in array interface methods, which makes runtime overhead inevitable.

It’s actually something else. A @code_native test2(v1, v2) contains things like

.LBB0_21:                               # %vector.body
                                        # =>This Inner Loop Header: Depth=1
        vmovups ymm0, ymmword ptr [rax + 8*rsi]
        vmovups ymm1, ymmword ptr [rax + 8*rsi + 32]
        vmovups ymm2, ymmword ptr [rax + 8*rsi + 64]
        vmovups ymm3, ymmword ptr [rax + 8*rsi + 96]
        vmovups ymmword ptr [rcx + 8*rsi], ymm0
        vmovups ymmword ptr [rcx + 8*rsi + 32], ymm1
        vmovups ymmword ptr [rcx + 8*rsi + 64], ymm2
        vmovups ymmword ptr [rcx + 8*rsi + 96], ymm3

It’s a very efficient way of copying which is is not present in test1(v1, v2). Presumable there is a special path for broadcasting Vector which uses copyto! instead of iteration.

And, actually, in test2 using copyto! or similar, it’s discovered at run time that the source and destination is the same. I.e. it’s a copy from u to u. And the timing for test2 is independent of the length of the vector. (v1.11.0-rc3).

You’re not propagating inbounds annotations, which will hurt performance. See propagating inbounds in the manual.

1 Like

All the broadcast! calls result in a copyto! call. It’s just that for Vector, because Julia knows the data is contiguous, a copyto!(dest, src) (or dest .= src) results in a call to memmove (which is a heavily optimized loop in libc), whereas for other AbstractArray type it falls back to a generic loop.

Of course, you can add a similar optimized copyto! method for your array. But it doesn’t matter for more general broadcast assignments, e.g. x .= x .+ 1 doesn’t use memmove even for Vector.

2 Likes

Thank you all for your input.