StructArray into KeyedArray

How do I turn a StructArray into a KeyedArray?

┌───────┬───────┬───────────┬──────────┬──────────┐
│     a │     b │         x │        y │        z │
│ Int64 │ Int64 │   Float64 │  Float64 │  Float64 │
├───────┼───────┼───────────┼──────────┼──────────┤
│     1 │     3 │ 0.0271853 │ 0.426652 │ 0.908247 │
│     2 │     3 │  0.331786 │ 0.470659 │  0.80698 │
│     1 │     4 │  0.636678 │ 0.350289 │ 0.708534 │
│     2 │     4 │ 0.0822821 │ 0.263246 │ 0.981152 │
└───────┴───────┴───────────┴──────────┴──────────┘

using StructArrays, AxisKeys, DataManipulation
let k = 4
    d = (;
        a = 1:k,
        b = [1,1,2,2],
        x = rand(k),
        y = rand(k),
        z = rand(k),
    ) |> StructArray

    @p begin
        d
        group(_.b)
        map(pairs(__)) do (b, __)
            group(_.a)
            map() do __
                [__.x __.y __.z]
                KeyedArray(; row=[1], col=[:x, :y, :z])
            end
            KeyedArray(collect(__); a=collect(keys(__)))
            stack
            dropdims(;dims=1)        
        end        
        KeyedArray(collect(__); b=collect(keys(__)))
        stack
    end
end
ERROR: DimensionMismatch: stack expects uniform axiskeys for all arrays

Found 2 issues with the code:

  1. stack of KeyedArrays returns a normal array and loses axes information.
  2. d.a is in the code is [1,2,3,4] where this leads to an error, as a keys in each b category should be similar (the a values in the graphic table are correct though :thinking:)

A working version could be:

@p begin
    d
    group(_.b)
    @aside bnames = collect(keys(__))
    map(pairs(__)) do (b, __)
        group(_.a)
        @aside anames = collect(keys(__))
        map(pairs(__)) do (a, __)
            [__.x __.y __.z]
        end
        stack
        dropdims(;dims=1)
        KeyedArray(;col = [:x, :y, :z], a = anames)
    end
    @aside anames = only(unique(axiskeys.(__)))[2]
    stack
    KeyedArray(;col = [:x, :y, :z], a = anames, b = bnames)
end

Note the use of @aside to propagate the keys from group ops, and the explicit check on all a keys being the same in each group.

This is basically unstack in DataFrames lingo. But a generalized version to match unstack could be nice.

1 Like

I think this is what I wanted:

let k = 4
    d = (;
        a = [:red, :blue, :red, :blue],
        b = [:big, :big, :small, :small],
        x = rand(k),
        y = rand(k),
        z = rand(k),
    ) |> StructArray

    @p begin
        d
        group(_.a)
        map() do __
            KeyedArray([__.x __.y __.z]; __.b, col=[:x, :y, :z])
        end
        KeyedArray(collect(__); a=collect(keys(__)))
        stack
    end
end

3-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   b ∈ 2-element view(::Vector{Symbol},...)
→   col ∈ 3-element Vector{Symbol}
◪   a ∈ 2-element Vector{Symbol}
And data, 2×3×2 Array{Float64, 3}:
[:, :, 1] ~ (:, :, :red):
            (:x)       (:y)       (:z)
  (:big)     0.566229   0.470626   0.404352
  (:small)   0.136995   0.568349   0.91954

[:, :, 2] ~ (:, :, :blue):
            (:x)       (:y)       (:z)
  (:big)     0.460618   0.124939   0.884448
  (:small)   0.882384   0.776553   0.0416588

@mcabbott @aplavin Have you thought about a function like this? It’s essentially unstack.

stack preserves axiskeys when both the inner arrays and the outer container are KeyedArrays.

uniqueonly is more efficient (: Probably not important here though.

This is cleaner while keeping the same spirit:

@p let
	d
	group((;_.a), restype=KeyedArray)
	map() do gr
		KeyedArray([gr.x gr.y gr.z]; gr.b, col=[:x, :y, :z])
	end
	stack
end
and another grouping-based soluation
@p let
	d
	group_vg(_.a)
	flatmap() do gr
	   KeyedArray([KeyedArray([gr.x gr.y gr.z]; gr.b, col=[:x, :y, :z])], a=[key(gr)])
	end
	stack
end

Also, AxisKeys.jl has wrapdims(tbl, :valuecol, :namecols...) that almost does what you want :slight_smile: It only supports a single valuecol though, so needs explicit handling of x, y, z:

flatmap([:x, :y, :z]) do c
	KeyedArray([wrapdims(columntable(d), c, :a, :b)], col=[c])
end |> stack
1 Like

I don’t quite get why

@p let
        d
        group((;_.a), restype=KeyedArray)
        stack() do gr
            KeyedArray([gr.x gr.y gr.z]; gr.b, col=[:x, :y, :z])
        end       
    end

returns an Array instead of a KeyedArray.

It’s just not overloaded – see AxisKeys.jl/src/functions.jl at e98481cb7a6be2cd3da3a9a28cff224f321d2e60 · mcabbott/AxisKeys.jl · GitHub for stack(A). Don’t think there’s any lower level function to define so that both work?..

I actually didn’t know about stack(f, A) at all :slight_smile:

1 Like