Extending broadcast to vector-like struct

Goncalo_Pena · September 1, 2024, 4:12pm

Hi,
I’m trying to implement a vector-like struct and I would like for it to behave wrt broadcasting, just as a regular vector.

My struct is something like:

struct NewVector
    values::Vector{Float64}
    text::String
end

So far, I’m trying to make this work:

u = Vector{Float64}(undef, 10)
v1 = NewVector(u, "vec1")
v2 = NewVector(u, "vec2")

v1 .= v2
v1 .+= v2

without much success. I’ve read the documentation on Interfaces in the Julia docs and I’ve implemented

Base.length(u::NewVector) = Base.length(u.values)
Base.size(u::NewVector) = Base.size(u.values)
Base.iterate(u::NewVector) = Base.iterate(u.values)
Base.iterate(u::NewVector, state) = Base.iterate(u.values, state)
Base.ndims(::Type{NewVector}) = 1

based on the error messages I keep getting. However, the next error message demands an implementation of copyto!(::NewVector, ::Base.Broadcast.Broadcasted{…}) which I don’t understand how to do.

Can anyone help me?

sgaure · September 1, 2024, 4:19pm

There are some examples here:

https://docs.julialang.org/en/v1/manual/interfaces/#man-interfaces-broadcasting

In particular the ArrayAndChar example, which seems quite similar to your example.

Goncalo_Pena · September 1, 2024, 4:31pm

Thank you so much for your quick answer. Indeed, implementing something similar as the example ArrayAndChar makes it work. However, there is a significant performance drop. Is there a way to circumvent this?

using BenchmarkTools

struct NewVector{T} <: AbstractVector{T}
    values::Vector{T}
    text::String
end

Base.length(u::NewVector) = Base.length(u.values)
Base.size(u::NewVector) = Base.size(u.values)
Base.iterate(u::NewVector) = Base.iterate(u.values)
Base.iterate(u::NewVector, state) = Base.iterate(u.values, state)
Base.ndims(::Type{NewVector}) = 1
Base.getindex(u::NewVector, i) = u.values[i]
Base.setindex!(u::NewVector, val, i) = u.values[i] = val

Base.BroadcastStyle(::Type{<:NewVector}) = Broadcast.ArrayStyle{NewVector}()

find_aac(bc::Base.Broadcast.Broadcasted) = find_aac(bc.args)
find_aac(args::Tuple) = find_aac(find_aac(args[1]), Base.tail(args))
find_aac(x) = x
find_aac(::Tuple{}) = nothing
find_aac(a::NewVector, rest) = a
find_aac(::Any, rest) = find_aac(rest)

function Base.similar(bc::Broadcast.Broadcasted{Broadcast.ArrayStyle{NewVector}}, ::Type{T}) where T
    A = find_aac(bc)
    return NewVector(similar(Vector{T}, axes(bc)), A.char)
end

u = Vector{Float64}(undef, 10)
v1 = NewVector(u, "vec1")
v2 = NewVector(u, "vec2")

function test1(a,b)
    a .= b
end

function test2(a,b)
    a.values .= b.values
end

test1(v1,v2);
test2(v1,v2);
@btime test1($v1,$v2); # 37.466 ns
@btime test2($v1,$v2); # 5.625 ns

Benny · September 1, 2024, 6:24pm

As far as I can tell, it’s the overhead of loading u.values in getindex and setindex!, and it’ll be outweighed by more operations and more data. test2 does the loads before the broadcast loop so the loads aren’t semantically repeated. I can’t rule out the compiler doing this for you through inlining and hoisting in niche situations, but it evidently doesn’t happen here. You don’t use u.text at any point (I think A.char was intended to be A.text) and if you don’t plan to for this part of the NewVector code, you could instead make a duplicate non-vector type that semantically wraps a vector to be unwrapped for mutation. On the other hand, this is the preferred approach if the extra fields in custom wrapper types would generally be involved in array interface methods, which makes runtime overhead inevitable.

sgaure · September 1, 2024, 6:44pm

It’s actually something else. A @code_native test2(v1, v2) contains things like

.LBB0_21:                               # %vector.body
                                        # =>This Inner Loop Header: Depth=1
        vmovups ymm0, ymmword ptr [rax + 8*rsi]
        vmovups ymm1, ymmword ptr [rax + 8*rsi + 32]
        vmovups ymm2, ymmword ptr [rax + 8*rsi + 64]
        vmovups ymm3, ymmword ptr [rax + 8*rsi + 96]
        vmovups ymmword ptr [rcx + 8*rsi], ymm0
        vmovups ymmword ptr [rcx + 8*rsi + 32], ymm1
        vmovups ymmword ptr [rcx + 8*rsi + 64], ymm2
        vmovups ymmword ptr [rcx + 8*rsi + 96], ymm3

It’s a very efficient way of copying which is is not present in test1(v1, v2). Presumable there is a special path for broadcasting Vector which uses copyto! instead of iteration.

sgaure · September 1, 2024, 6:52pm

And, actually, in test2 using copyto! or similar, it’s discovered at run time that the source and destination is the same. I.e. it’s a copy from u to u. And the timing for test2 is independent of the length of the vector. (v1.11.0-rc3).

stevengj · September 1, 2024, 8:44pm

You’re not propagating inbounds annotations, which will hurt performance. See propagating inbounds in the manual.

stevengj · September 1, 2024, 8:59pm

All the broadcast! calls result in a copyto! call. It’s just that for Vector, because Julia knows the data is contiguous, a copyto!(dest, src) (or dest .= src) results in a call to memmove (which is a heavily optimized loop in libc), whereas for other AbstractArray type it falls back to a generic loop.

Of course, you can add a similar optimized copyto! method for your array. But it doesn’t matter for more general broadcast assignments, e.g. x .= x .+ 1 doesn’t use memmove even for Vector.

Goncalo_Pena · September 2, 2024, 2:29pm

Thank you all for your input.

Topic		Replies	Views
Implementing broadcasting for a nested `Vector` with a scalar General Usage question , broadcasting , multiple-dispatch	7	456	January 11, 2023
Does this style of “property broadcasting” already exist? General Usage broadcast , dataframes	3	491	August 20, 2022
Broadcasting over structs in 1.0: what am I supposed to do? General Usage	3	2928	September 26, 2018
Anyone know how to do broadcasting on a function with a non-iterable argument? i.e General Usage	1	453	February 11, 2021
Broadcasting with vectors of static vectors General Usage broadcast	11	2750	July 30, 2018

Extending broadcast to vector-like struct

Related topics