(post deleted by author)
You could also define
(s::Symbol)(data) = getproperty(data, s)
which would let you say
:x.(Array_of_A)
would you mind applying to the above example with X T and T1 ?
Sure:
julia> :T1.(:T.([X...])) .+ 0
3-element Vector{Int64}:
1
2
2
Thanks
This is type-piracy though, and might have unexpected consequences
Is there a way to improve the definition?
I don’t know of an easy and equally concise way. One way to avoid the piracy is to wrap the Symbol
in your own type, eg.
julia> struct MySym
s :: Symbol
end
julia> (s::MySym)(data) = getproperty(data, s.s)
julia> MySym(:T1).(MySym(:T).([X...])) .+ 0
3-element Vector{Int64}:
1
2
2
One severe pitfall that hasn’t been mentioned is that ..
has a pretty low operator precedence, equal to :
. So unfortunately we’re stuck with
julia> :( A..x .^ 2 ) == :( A..(x .^ 2) )
true
Which is probably a good thing, because this confirms that ..
was intended by the language designers to be used as a :
-like range operator, so it’s unlikely that different packages will use it with wildly different semantics…
Any update here?
I’m working a lot with structures of the form
X=[
(;a=(x=1,y=1),b=()),
(;a=(x=2,y=3),b=()),
]
and often want to get an array of x
. I can do
⋄(data,s)=getproperty.(data,[s])
X ⋄ :a ⋄ :x
which is ok. but something like X..a..x
would be much better.
https://juliaobjects.github.io/Accessors.jl/stable/lenses/
julia> X=[
(;a=(x=1,y=1),b=()),
(;a=(x=2,y=3),b=()),
];
julia> using Accessors
julia> (@optic _.a.x).(X)
2-element Vector{Int64}:
1
2
That’s really cool. Thank you.
would it be difficult to extend it to something closer to X.a.x
say:
@optic X _.a.x
using Accessors
julia> macro mapoptic(e, v)
:(map((@optic $e), $v))
end
@mapoptic (macro with 1 method)
julia> @mapoptic _.a.x X
2-element Vector{Int64}:
1
2
tho if the interface you really want is just X.a.x
there should be a way to do that using StructArrays or something.
Indeed StructArrays.jl works well here:
julia> using StructArrays
julia> X = [
(;a=(x=1,y=1),b=()),
(;a=(x=2,y=3),b=()),
]
2-element Vector{@NamedTuple{a::@NamedTuple{x::Int64, y::Int64}, b::Tuple{}}}:
(a = (x = 1, y = 1), b = ())
(a = (x = 2, y = 3), b = ())
julia> Y = StructArray(X, unwrap=t -> t <: NamedTuple)
2-element StructArray(StructArray(::Vector{Int64}, ::Vector{Int64}), ::Vector{Tuple{}}) with eltype @NamedTuple{a::@NamedTuple{x::Int64, y::Int64}, b::Tuple{}}:
(a = (x = 1, y = 1), b = ())
(a = (x = 2, y = 3), b = ())
julia> Y[1]
(a = (x = 1, y = 1), b = ())
julia> Y.a.x
2-element Vector{Int64}:
1
2
StructArrays are definitely the efficient and convenient data structure for this kind of columnar manipulations! You can store the dataset as a StructArray from the beginning and don’t do any conversions.
Btw, Accessors now export @o
as an alias for @optic
, to further encourage using this macro (:
map((@o _.a.x), X)
works for all arrays, and for StructArrays it can be made as efficient as X.a.x
. This optimization is more experimental, and for now is only in AccessorsExtra.jl
, not in Accessors.jl
proper.
This is brilliant.
Thank you
Is it possible to get StructArray type syntax but maintain coupling with the underlying vector of Structs ?
Of course, it’s possible – thanks to Julia composability (:
One approach is to make separate views of each component of your original array, and put them into a StructArray:
julia> using StructArrays, FlexiMaps, Accessors
julia> X = [(a=1, b=2), (a=3, b=4)]
2-element Vector{@NamedTuple{a::Int64, b::Int64}}:
(a = 1, b = 2)
(a = 3, b = 4)
julia> Y = StructArray(
a=mapview((@o _.a), X),
b=mapview((@o _.b), X),
# repeat for all components you need
)
2-element StructArray(::FlexiMaps.MappedArray{Int64, 1, PropertyLens{:a}, Vector{@NamedTuple{a::Int64, b::Int64}}}, ::FlexiMaps.MappedArray{Int64, 1, PropertyLens{:b}, Vector{@NamedTuple{a::Int64, b::Int64}}}) with eltype @NamedTuple{a::Int64, b::Int64}:
(a = 1, b = 2)
(a = 3, b = 4)
julia> Y[2]
(a = 3, b = 4)
# the new Y array actually refers to X values, and updates correspondingly
julia> Y.a .= [5, 6]
2-element FlexiMaps.MappedArray{Int64, 1, PropertyLens{:a}, Vector{@NamedTuple{a::Int64, b::Int64}}}:
5
6
julia> X
2-element Vector{@NamedTuple{a::Int64, b::Int64}}:
(a = 5, b = 2)
(a = 6, b = 4)
julia> Y.b[2] = 10
10
julia> X
2-element Vector{@NamedTuple{a::Int64, b::Int64}}:
(a = 5, b = 2)
(a = 6, b = 10)
But I personally think it’s better to just store your data in a StructArray in the first place. Are there any specific reasons you prefer a basic Vector
here?
I use functions to operate on the underlying structures.
The basic vector is coupled to the underling structs. The StructArray isn’t.
@kwdef mutable struct X
a::Int64
b::Int64
end
SA = StructArray( [X(1,1), X(2,2)] )
function f!( x::X )
x.a += 10
end
f!.(SA)
SA.a[1] #1
v = collect(SA)
f!.(v)
v[1].a #11
Unless you define it as you’ve done above.
My underlying structs might have a dozen fields.
Is there any shorthand to do the below for all fields?
Y = StructArray(
a=mapview((@o _.a), X),
b=mapview((@o _.b), X),
# repeat for all components you need
)
This is a common issue with StructArrays, see this page from the documentation: Some counterintuitive behaviors · StructArrays
When you do SA[1]
it creates an X
struct on the fly. The f!.(SA)
call operates on these on-the-fly structs so it has no effect: the on-the-fly structs are discarded at the end of the f!
call.
To work around this, you can work on “lazy rows” rather than temporary structs:
using StructArrays
@kwdef mutable struct X
a::Int64
b::Int64
end
SA = StructArray( [X(1,1), X(2,2)] )
function f!( x )
x.a += 10
end
f!.(LazyRows(SA))
SA.a[1] #11