A few years back I requested … be used to broadcast object property over array of objects.
If A.x is property x of object A. Then Array_of_A…x is an array of the x properties of the A objects.
Was told … was already used syntax but other options were considered.
Was any implemented ?
1 Like
getproperty.((A,),:x)
works I believe.
6 Likes
Thank you very much. Its actually even simpler.
getproperty.( Array_of_A , :x )
2 Likes
I wouldn’t recommend it in any real code since its pretty much type-piracy, but drawing inspiration from DataFrames.jl, you could always do this and save a few keystrokes:
julia> Base.getindex(A::AbstractArray, ::typeof(!), s::Symbol) = getproperty.(A,s)
julia> A = [(x=1, y=2), (x=2, y=3)];
julia> A[!,:x]
2-element Array{Int64,1}:
1
2
1 Like
If you really wanted to, you could actually define ..
like this:
julia> (..)(x, y) = Base.broadcasted(getproperty, x, y)
.. (generic function with 1 method)
julia> a = rand(ComplexF64, 10)
10-element Array{Complex{Float64},1}:
0.11262697544520428 + 0.5330050410664369im
0.2842270409352914 + 0.9623112275404697im
0.5789655516723524 + 0.531497283168638im
0.4253891593454424 + 0.7024599901901563im
0.17841878181226756 + 0.29629083377050325im
0.2444645399223282 + 0.5410678357934589im
0.5315805189120171 + 0.6919723210614765im
0.913767623297602 + 0.7772552581817502im
0.31236717253965085 + 0.45337765885602055im
0.18514749247167672 + 0.12337639732432293im
julia> Base.materialize(a..:re)
10-element Array{Float64,1}:
0.11262697544520428
0.2842270409352914
0.5789655516723524
0.4253891593454424
0.17841878181226756
0.2444645399223282
0.5315805189120171
0.913767623297602
0.31236717253965085
0.18514749247167672
julia> b = similar(a, Float64);
julia> b .= (a..:re) .+ 1
10-element Array{Float64,1}:
1.1126269754452043
1.2842270409352914
1.5789655516723524
1.4253891593454424
1.1784187818122676
1.2444645399223282
1.531580518912017
1.913767623297602
1.3123671725396509
1.1851474924716767
9 Likes
I was writing a question about this subject and your response was recommended.
Two questions.
First, x..:a
to mean getproperty.(x, :a)
is super super useful for repl-based exploration. Is there any reason why this isn’t built-in? Are there any pitfals if I just add your definition to my startup.jl?
Second question, does getproperty.(x, :a)
copy, or is it just an iterator? Often times I just want to run some simple statistic on it, and for that purpose I don’t really want to allocate a new array and have that copied over. For example, being able to do the sum below without allocation:
struct MyP
x::Float64
y::Float64
end
v = MyP.(rand(1000), rand(1000))
sum(v..x)
1 Like
StructArrays.jl is also a very nice way to solve this problem.
1 Like
..
is used as syntax for interval arithmetic, but as long as you aren’t using that, it will all work out. Broadcasting will usually make a copy, but multiple chained dots told together to have a single allocation.
2 Likes
Yes, the reason I returned a Base.broadcasted
object here was so that it’s lazy and that it fuses with other broadcasted functions. That also means you should be able to use it just like any other iterator, for example in sum
.
2 Likes
Also, this is a slight tangent, but is there any chance Julia could get a feature for 0 copy broadcast reductions? The downside is that it would require new syntax, but it would remove one of the few places where explicit for loops are still required for performance.
1 Like
Hi, thanks for the suggestion, but I can’t use StructArrays.jl. For the specific use-case I’m dealing with most acces is row-based, and so I’d like to need to keep these data structures as arrays of structs for performance. I wasn’t looking for a different data structure, I was just looking to saving keystrokes on the repl.
I see, I think this is what I was looknig for, thank you. Any idea why this isn’t built-in? Is it just because it looks ugly?
The main argument against it, as @Oscar_Smith already pointed out, is that packages like IntervalArithmetic.jl already use ..
as an operator for constructing intervals, so it would probably be confusing if Base already defined this to do something different. But you are of course free to just put this into your startup file!
2 Likes
Admitedly I haven’t done any performance testing. But correct me if I’m wrong. In my example,
struct MyP
x::Float64
y::Float64
end
v = MyP.(rand(1000), rand(1000))
x
and y
would be stored as separate arrays, right? So if I’m iterating through v
for i in v
( do stuff )
end
which is something that for my use-case I’m doing all the time, then I’m basically forcing cache misses, no?
I could be misunderstanding something.
Obviously any time you do rand(1000)
you will get an array. But read the docs I linked to, the following shouldn’t allocate separate arrays of floats.
julia> StructArray(MyP(rand(), rand()) for i in 1:1000)
Ok, but my concern isn’t with the instantiation, but with the iteration.
What is the memory layout like? Is it x[1], y[1], x[2], y[2], ... x[N], y[N]
or x[1], ... x[N], y[1], ... y[N]
. The first one is nice to iterate over structs, the second one is bad.
For complete reductions, I think this already works, although I thought it was faster:
julia> @btime sum(v .* v') setup=(v=ones(100))
6.286 μs (2 allocations: 78.20 KiB)
10000.0
julia> @btime sum(Broadcast.Broadcasted(*,(v,v'))) setup=(v=ones(100))
38.672 μs (0 allocations: 0 bytes)
10000.0
Issue about the syntax: https://github.com/JuliaLang/julia/issues/19198 which led to LazyArrays @~
, with which:
julia> using LazyArrays
julia> @btime sum(@~ v .* v') setup=(v=ones(100))
30.671 μs (0 allocations: 0 bytes)
10000.0
julia> @btime sum(LazyArray(@~ v .* v')) setup=(v=ones(100))
1.841 μs (0 allocations: 0 bytes)
10000.0
2 Likes
It’s the first one, I think. StructArrays will store it like a vector of named tuples, which is the first.
Side note, but you won’t get all the benefits unless you fully type your point (I think. but don’t quote me on this)
MyP{T} where T <: Real
x::T
y::T
end
using simeonschaub’s code
(..)(x, y) = Base.broadcasted(getproperty, x, y)
and sample data:
T = (
TA = ( T1 = 1, T2 = 5 ),
TB = ( T1 = 2, T2 = 10 )
)
X = (
XA = ( X1 = 'A', T=T.TA ),
XB = ( X1 = 'B', T=T.TB ),
XC = ( X1 = 'C', T=T.TB )
)
These both work, giving the vector 1 2 2
(([X...]..:T)..:T1) .+ 0
(DataFrame(X).T..:T1) .+ 0
Though it would be really nice if you could just write: X…:T1
i.e. putting T=T.TA, T=T.TB created a link or table join so that elements of T are available as elements of X. Is there any structure that can do this?
Also, its not hard to write [X…] or DataFrame(X)
But is there a data structure that can specify both
columns: eg DataFrame(X).T and
rows by index: eg X.XA