Does this style of “property broadcasting” already exist?

Hi Julians,
I’ve found myself wanting convenient ways to manipulate arrays of struct-like objects, such as:

julia> df = [(x = 1,), (x = 2,), (x = 3,), (x = 4,), (x = 5,)];

julia> “df.y = df.x .^ 2” # pseudocode
5-element Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}:
 (x = 1, y = 1)
 (x = 2, y = 4)
 (x = 3, y = 9)
 (x = 4, y = 16)
 (x = 5, y = 25)

Think of it like broadcasting the getproperty call df.x to produce getproperty.(df, :x), and doing something similar for the assignment.

The style is similar to using DataFramesMeta.jl,

julia> @transform(DataFrame(df), :y = :x.^2)
5×2 DataFrame
 Row │ x      y
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      4
   3 │     3      9
   4 │     4     16
   5 │     5     25

except that DataFrames are not quite as flexible because their columns are constrained to one dimension. On the other hand, Julia’s native broadcasting allows you to extend dimensions easily, and isn’t as fussy about preserving lengths.

This train of thought led to the following few lines of Julia defining a “dot broadcasting” macro @..

Macro definition
using MacroTools: prewalk, @capture

# allow setting fields of immutable named tuples
function setfield(nt::NamedTuple, value, field)
	names = Tuple(keys(nt) ∪ (field,))
	NamedTuple{names}(k == field ? value : nt[k] for k ∈ names)
end

broadcast_dot_operator(expr) = prewalk(expr) do node
	if @capture(node, x_.k_ = y_)
		:( $x = $setfield.($x, $y, $(Meta.quot(k))) )
	elseif @capture(node, x_.k_)
		:( getindex.($x, $(Meta.quot(k))) )
	else
		node
	end
end

macro var".."(expr)
	broadcast_dot_operator(esc(expr))
end

which allows you to do such things as

julia> df = [(x = 0,)]; # start with single ‘data point’

julia> @.. df.x = 1:3 # easily extend dimensions
3-element Vector{NamedTuple{(:x,), Tuple{Int64}}}:
 (x = 1,)
 (x = 2,)
 (x = 3,)

julia> @.. begin # easily add dimensions
           df.y = (1:2)'
           df.z = df.x .* df.y
       end
3×2 Matrix{NamedTuple{(:x, :y, :z), Tuple{Int64, Int64, Int64}}}:
 (x = 1, y = 1, z = 1)  (x = 1, y = 2, z = 2)
 (x = 2, y = 1, z = 2)  (x = 2, y = 2, z = 4)
 (x = 3, y = 1, z = 3)  (x = 3, y = 2, z = 6)

julia> @.. df.z = [df;;; df].z .* [1;;; 100]
3×2×2 Array{NamedTuple{(:x, :y, :z), Tuple{Int64, Int64, Int64}}, 3}:
[:, :, 1] =
 (x = 1, y = 1, z = 1)  (x = 1, y = 2, z = 2)
 (x = 2, y = 1, z = 2)  (x = 2, y = 2, z = 4)
 (x = 3, y = 1, z = 3)  (x = 3, y = 2, z = 6)

[:, :, 2] =
 (x = 1, y = 1, z = 100)  (x = 1, y = 2, z = 200)
 (x = 2, y = 1, z = 200)  (x = 2, y = 2, z = 400)
 (x = 3, y = 1, z = 300)  (x = 3, y = 2, z = 600)

I’m wondering if a comparable kind of broadcasting for getproperty and setproperty is already defined in some package. Is this kind of notation already in use? If not, should it be a package?

2 Likes

Query.jl can do what you’re asking for in the first code block

julia> df |> Query.@mutate(y = _.x^2) |> collect
5-element Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}:
 (x = 1, y = 1)
 (x = 2, y = 4)
 (x = 3, y = 9)
 (x = 4, y = 16)
 (x = 5, y = 25)

No macros, plain jullia syntax lets you do what you want. You only need a mutable named tuple:

using OrderedCollections
const MTuple = LittleDict{Symbol,Any,Vector{Symbol},Vector{Any}}
OrderedCollections.LittleDict{Symbol,Any,Vector{Symbol},Vector{Any}}(; kv...) = MTuple([keys(kv)...], [values(kv)...])
function OrderedCollections.LittleDict{Symbol,Any,Vector{Symbol},Vector{Any}}(kv::NamedTuple)
    MTuple([keys(kv)...], [values(kv)...])
end
Base.getproperty(p::MTuple, s::Symbol) = isdefined(p, s) ? getfield(p, s) : p[s]
Base.setproperty!(p::MTuple, s::Symbol, v) = setindex!(p, v, s)
Base.propertynames(p::MTuple, ::Bool) = keys(p)
Base.show(io::IO, ::Type{MTuple}) = print(io, "MTuple")
Base.show(io::IO, x::MTuple) = (print(io, "MTuple"); show(io, NamedTuple(keys(x) .=> values(x))))

Vector of Structs:

struct StructVector{T<:Vector}
    v::T
end
const SV = StructVector
Base.values(sa::StructVector) = getfield(sa, :v)
Base.getproperty(sa::StructVector, p::Symbol) = getproperty.(values(sa), p)
Base.setproperty!(sa::StructVector, p::Symbol, v) = setproperty!.(values(sa), p, v)
Base.getindex(sa::StructVector, i) = SV(getindex(values(sa), i))
Base.getindex(sa::StructVector, i::Int) = getindex(values(sa), i)
Base.isempty(sa::StructVector) = isempty(values(sa))
Base.length(sa::StructVector) = len(values(sa))
Base.iterate(sa::StructVector, i=1) = iterate(values(sa), i)
Base.lastindex(sa::StructVector) = lastindex(values(sa))
Base.show(io::IO, m::MIME"text/plain", a::StructVector) = (print(io, "StructVector: "); show(io, m, values(a)))
Base.append!(a::StructVector, b::StructVector) = append!(values(a), values(b))

and viola:

df = [(x = 1,), (x = 2,), (x = 3,), (x = 4,), (x = 5,)] .|> MTuple |> SV;
julia> df.x
5-element Vector{Int64}:
 1
 2
 3
 4
 5
julia> df.y = df.x .^ 2
5-element Vector{Int64}:
  1
  4
  9
 16
 25
julia> df
StructVector: 5-element Vector{MTuple}:
 MTuple(x = 1, y = 1)
 MTuple(x = 2, y = 4)
 MTuple(x = 3, y = 9)
 MTuple(x = 4, y = 16)
 MTuple(x = 5, y = 25)
1 Like

I also find this way of manipulating data convenient. It can easily represent plain tables, so that one basically doesn’t need specialized packages for them. And when flat tables aren’t enough, it directly generalizes to higher-dim arrays, or arrays containing something else, not just named tuples.

A nice, efficient (due to StructArrays) implementation:

julia> using StructArrays, AccessorsExtra

# create single-row table
julia> tbl = StructArray([(x = 0,)])
1-element StructArray(::Vector{Int64}) with eltype NamedTuple{(:x,), Tuple{Int64}}:
 (x = 0,)

# make it three-row by replacing the only column
julia> tbl = @set tbl.x = 1:3
3-element StructArray(::UnitRange{Int64}) with eltype NamedTuple{(:x,), Tuple{Int64}}:
 (x = 1,)
 (x = 2,)
 (x = 3,)

# add another column from explicit values
julia> tbl = @insert tbl.y = [10, 20, 30]
3-element StructArray(::UnitRange{Int64}, ::Vector{Int64}) with eltype NamedTuple{(:x, :y), Tuple{Int64, Int64}}:
 (x = 1, y = 10)
 (x = 2, y = 20)
 (x = 3, y = 30)

# or by combining existing columns
julia> tbl = @insert tbl.z = tbl.x .* tbl.y
3-element StructArray(::UnitRange{Int64}, ::Vector{Int64}, ::Vector{Int64}) with eltype NamedTuple{(:x, :y, :z), Tuple{Int64, Int64, Int64}}:
 (x = 1, y = 10, z = 10)
 (x = 2, y = 20, z = 40)
 (x = 3, y = 30, z = 90)

Note that each step creates a new table: treating data as immutable is the Accessors.jl philosophy. But it’s efficient and doesn’t copy unchanged columns due to how StructArrays work.

This works with arrays of arbitrary dimensions, of course, but all components (x, y, z here) should have the same size: they are semantically treated as components of a single array.

1 Like