Passing field names to StructArray()?


#1

I recently became aware of StructArrays (Array of tuples to tuple of arrays) that makes life much easier especially when using the dot syntax.

When converting an array of (unnamed) tuples to StructArray, is it possible to name the fields other than x1, x2 etc… or perhaps add this kind of feature?

E.g. it would behave like this:

julia> a=[(1,2),(3,4)]
2-element Array{Tuple{Int64,Int64},1}:
 (1, 2)
 (3, 4)

julia> b=StructArray([:k, :l], a)
2-element StructArray{Tuple{Int64,Int64},1,NamedTuple{(:k, :l),Tuple{Array{Int64,1},Array{Int64,1}}}}:
 (1, 2)
 (3, 4)

The above is just a minimal example (of course I could have used a named tuple in the first place), the real need arises when using e.g. a function that returns unnamed tuples and the dot syntax creates an array of unnamed tuples.


#2

There are two distinct things:

  1. rename the columns and also change the element type from Tuple to NamedTuple with the new names
  2. just rename the columns and keep the element type a Tuple

The first one is reasonably easy but the second one is a bit tricky (as I assume that you can know the fieldnames of the StructArray from the element type) though it could be implemented.

Which one would you suggest and what exactly is the use case (meaning, are there strong reasons to prefer 2 over 1)?


#3

Well, my question stems from laziness :slight_smile: and also aesthetics, the exact use case that inspired my post is from AstroLib.jl where you can calculate altitude and azimuth of celestial objects:

julia> using AstroLib, Dates, StructArrays

julia> jd = jdcnv.(DateTime(2018,7,1,0,0,0):Minute(15):DateTime(2018,7,2,0,0,0));

julia> elazha = StructArray(eq2hor.(12.0, 34.0, jd, 60.0, 22.0))
97-element StructArray{Tuple{Float64,Float64,Float64},1,NamedTuple{(:x1, :x2, :x3),Tuple{Array{Float64,1},Array{Float64,1},Array{Float64,1}}}}:
 (38.23248439593843, 86.36784792068306, 288.7493589813226)  
 (40.10982978106454, 89.57038203183892, 292.50962467840037) 
 (41.98803193710982, 92.86188240265157, 296.26989054371336) 
 (43.860800319734665, 96.25757929649852, 300.0301562411113) 
 ⋮                                                          
 (33.142358910306264, 77.94965068913794, 278.45407457554785)
 (34.9888535102176, 80.9741452608937, 282.21434028314263)   
 (36.851161379216556, 84.05363174954496, 285.9746061584369) 
 (38.724134676981905, 87.19933588058427, 289.7348718665901) 

julia> elazha.x1 # elazha.az would be nicer...
97-element Array{Float64,1}:
 38.23248439593843 
 40.10982978106454 
 41.98803193710982 
 43.860800319734665
  ⋮                
 33.142358910306264
 34.9888535102176  
 36.851161379216556
 38.724134676981905

I guess renaming the columns and changing the type to NamedTuple would be perfect at least in this case.


#4

Then I think you can either check with AstroLib if they’re happy to change return type to NamedTuple (it would still allow a, b, c = eq2hor(...) so I think it’s an improvement if the names are standard) or do it yourself:

nt_vec = NamedTuple{(:el, :az, :ha)}.(eq2hor.(12.0, 34.0, jd, 60.0, 22.0))
elazha = StructArray(nt_vec)

It may look a bit magical, but NamedTuple{(:el, :az, :ha)} is simply converting the Tuple to the NamedTuple (i.e. (1, 2, 3) goes to (el = 1, az = 2, ha = 3)). The . will fuse so there shouldn’t be a loss of performance compared to what you had.

I’ll see if it’s easy to relax some things in StructArrays so that there is more flexibility. For example, if elazha has already the correct type (NamedTuple with fields (:el, :az, :ha)) now

push!(elazha, (1.0, 2.0, 3.0))

errors (you need push!(elazha, NamedTuple{(:el, :az, :ha)}((1.0, 2.0, 3.0))), but it should probably be made to work.


#5

Wow, that surely looks like magic :-), it’s perfect for my needs. I think I’ll use this idiom quite often:

coords = NamedTuple{(:el, :az, :ha)}.(eq2hor.(ra, dec, jd, 60.0, 22.0))|> StructArray
plot(coords.az, coords.el)

Perhaps the new standard when returning multiple values should be named tuples…?

Thanks!


#6

Thank you for making me look back at the Tuple case (there were a couple of oversights… I have really only used the struct / NamedTuple case).

I’ve fixed them on master, with some additionally goodies. Now (after ] add StructArrays#master) if you’re looking for performance you can preallocate the destination to avoid the intermediate array of tuples (it will also now how to convert automatically from Tuples to NamedTuples) and broadcast! unto it:

julia> NT = NamedTuple{(:el, :az, :ha), Tuple{Float64, Float64, Float64}}

julia> sink = StructArray{NT}(undef, length(jd));

julia> @. sink = eq2hor(12.0, 34.0, jd, 60.0, 22.0)

Tbh, if there are clear standard names for what is being returned, I’d say that’s better than returning a Tuple.