Hi Julians,
I’ve found myself wanting convenient ways to manipulate arrays of struct-like objects, such as:

``````julia> df = [(x = 1,), (x = 2,), (x = 3,), (x = 4,), (x = 5,)];

julia> “df.y = df.x .^ 2” # pseudocode
5-element Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}:
(x = 1, y = 1)
(x = 2, y = 4)
(x = 3, y = 9)
(x = 4, y = 16)
(x = 5, y = 25)
``````

Think of it like broadcasting the `getproperty` call `df.x` to produce `getproperty.(df, :x)`, and doing something similar for the assignment.

The style is similar to using `DataFramesMeta.jl`,

``````julia> @transform(DataFrame(df), :y = :x.^2)
5×2 DataFrame
Row │ x      y
│ Int64  Int64
─────┼──────────────
1 │     1      1
2 │     2      4
3 │     3      9
4 │     4     16
5 │     5     25
``````

except that `DataFrame`s are not quite as flexible because their columns are constrained to one dimension. On the other hand, Julia’s native broadcasting allows you to extend dimensions easily, and isn’t as fussy about preserving lengths.

This train of thought led to the following few lines of Julia defining a “dot broadcasting” macro `@..`

Macro definition
``````using MacroTools: prewalk, @capture

# allow setting fields of immutable named tuples
function setfield(nt::NamedTuple, value, field)
names = Tuple(keys(nt) ∪ (field,))
NamedTuple{names}(k == field ? value : nt[k] for k ∈ names)
end

if @capture(node, x_.k_ = y_)
:( \$x = \$setfield.(\$x, \$y, \$(Meta.quot(k))) )
elseif @capture(node, x_.k_)
:( getindex.(\$x, \$(Meta.quot(k))) )
else
node
end
end

macro var".."(expr)
end
``````

which allows you to do such things as

``````julia> df = [(x = 0,)]; # start with single ‘data point’

julia> @.. df.x = 1:3 # easily extend dimensions
3-element Vector{NamedTuple{(:x,), Tuple{Int64}}}:
(x = 1,)
(x = 2,)
(x = 3,)

julia> @.. begin # easily add dimensions
df.y = (1:2)'
df.z = df.x .* df.y
end
3×2 Matrix{NamedTuple{(:x, :y, :z), Tuple{Int64, Int64, Int64}}}:
(x = 1, y = 1, z = 1)  (x = 1, y = 2, z = 2)
(x = 2, y = 1, z = 2)  (x = 2, y = 2, z = 4)
(x = 3, y = 1, z = 3)  (x = 3, y = 2, z = 6)

julia> @.. df.z = [df;;; df].z .* [1;;; 100]
3×2×2 Array{NamedTuple{(:x, :y, :z), Tuple{Int64, Int64, Int64}}, 3}:
[:, :, 1] =
(x = 1, y = 1, z = 1)  (x = 1, y = 2, z = 2)
(x = 2, y = 1, z = 2)  (x = 2, y = 2, z = 4)
(x = 3, y = 1, z = 3)  (x = 3, y = 2, z = 6)

[:, :, 2] =
(x = 1, y = 1, z = 100)  (x = 1, y = 2, z = 200)
(x = 2, y = 1, z = 200)  (x = 2, y = 2, z = 400)
(x = 3, y = 1, z = 300)  (x = 3, y = 2, z = 600)
``````

I’m wondering if a comparable kind of broadcasting for `getproperty` and `setproperty` is already defined in some package. Is this kind of notation already in use? If not, should it be a package?

2 Likes

`Query.jl` can do what you’re asking for in the first code block

``````julia> df |> Query.@mutate(y = _.x^2) |> collect
5-element Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}:
(x = 1, y = 1)
(x = 2, y = 4)
(x = 3, y = 9)
(x = 4, y = 16)
(x = 5, y = 25)
``````

No macros, plain jullia syntax lets you do what you want. You only need a mutable named tuple:

``````using OrderedCollections
const MTuple = LittleDict{Symbol,Any,Vector{Symbol},Vector{Any}}
OrderedCollections.LittleDict{Symbol,Any,Vector{Symbol},Vector{Any}}(; kv...) = MTuple([keys(kv)...], [values(kv)...])
function OrderedCollections.LittleDict{Symbol,Any,Vector{Symbol},Vector{Any}}(kv::NamedTuple)
MTuple([keys(kv)...], [values(kv)...])
end
Base.getproperty(p::MTuple, s::Symbol) = isdefined(p, s) ? getfield(p, s) : p[s]
Base.setproperty!(p::MTuple, s::Symbol, v) = setindex!(p, v, s)
Base.propertynames(p::MTuple, ::Bool) = keys(p)
Base.show(io::IO, ::Type{MTuple}) = print(io, "MTuple")
Base.show(io::IO, x::MTuple) = (print(io, "MTuple"); show(io, NamedTuple(keys(x) .=> values(x))))
``````

Vector of Structs:

``````struct StructVector{T<:Vector}
v::T
end
const SV = StructVector
Base.values(sa::StructVector) = getfield(sa, :v)
Base.getproperty(sa::StructVector, p::Symbol) = getproperty.(values(sa), p)
Base.setproperty!(sa::StructVector, p::Symbol, v) = setproperty!.(values(sa), p, v)
Base.getindex(sa::StructVector, i) = SV(getindex(values(sa), i))
Base.getindex(sa::StructVector, i::Int) = getindex(values(sa), i)
Base.isempty(sa::StructVector) = isempty(values(sa))
Base.length(sa::StructVector) = len(values(sa))
Base.iterate(sa::StructVector, i=1) = iterate(values(sa), i)
Base.lastindex(sa::StructVector) = lastindex(values(sa))
Base.show(io::IO, m::MIME"text/plain", a::StructVector) = (print(io, "StructVector: "); show(io, m, values(a)))
Base.append!(a::StructVector, b::StructVector) = append!(values(a), values(b))
``````

and viola:

``````df = [(x = 1,), (x = 2,), (x = 3,), (x = 4,), (x = 5,)] .|> MTuple |> SV;
julia> df.x
5-element Vector{Int64}:
1
2
3
4
5
julia> df.y = df.x .^ 2
5-element Vector{Int64}:
1
4
9
16
25
julia> df
StructVector: 5-element Vector{MTuple}:
MTuple(x = 1, y = 1)
MTuple(x = 2, y = 4)
MTuple(x = 3, y = 9)
MTuple(x = 4, y = 16)
MTuple(x = 5, y = 25)``````
1 Like

I also find this way of manipulating data convenient. It can easily represent plain tables, so that one basically doesn’t need specialized packages for them. And when flat tables aren’t enough, it directly generalizes to higher-dim arrays, or arrays containing something else, not just named tuples.

A nice, efficient (due to StructArrays) implementation:

``````julia> using StructArrays, AccessorsExtra

# create single-row table
julia> tbl = StructArray([(x = 0,)])
1-element StructArray(::Vector{Int64}) with eltype NamedTuple{(:x,), Tuple{Int64}}:
(x = 0,)

# make it three-row by replacing the only column
julia> tbl = @set tbl.x = 1:3
3-element StructArray(::UnitRange{Int64}) with eltype NamedTuple{(:x,), Tuple{Int64}}:
(x = 1,)
(x = 2,)
(x = 3,)

# add another column from explicit values
julia> tbl = @insert tbl.y = [10, 20, 30]
3-element StructArray(::UnitRange{Int64}, ::Vector{Int64}) with eltype NamedTuple{(:x, :y), Tuple{Int64, Int64}}:
(x = 1, y = 10)
(x = 2, y = 20)
(x = 3, y = 30)

# or by combining existing columns
julia> tbl = @insert tbl.z = tbl.x .* tbl.y
3-element StructArray(::UnitRange{Int64}, ::Vector{Int64}, ::Vector{Int64}) with eltype NamedTuple{(:x, :y, :z), Tuple{Int64, Int64, Int64}}:
(x = 1, y = 10, z = 10)
(x = 2, y = 20, z = 40)
(x = 3, y = 30, z = 90)
``````

Note that each step creates a new table: treating data as immutable is the Accessors.jl philosophy. But it’s efficient and doesn’t copy unchanged columns due to how StructArrays work.

This works with arrays of arbitrary dimensions, of course, but all components (x, y, z here) should have the same size: they are semantically treated as components of a single array.

1 Like