I want to “flatten” a table with multiple cells that contain vectors (these have equal lengths per table-row):
t = table((group = [1,1,2,2], x = [1,2,3,4]))
t2 = groupby(t, :group, usekey = true) do k, r
(x2 = r.x .+ k.group, y2 = rand(length(r)))
end
If I flatten it on say column x2
, column y2
doesn’t get “flattened”:
julia> flatten(t2, :x2)
Table with 4 rows, 3 columns:
group x2 y2
group x2 y2
─────────────────────────────────
1 2 [0.154278, 0.31287]
1 3 [0.154278, 0.31287]
2 5 [0.0977201, 0.0372307]
2 6 [0.0977201, 0.0372307]
What I need is to “ungroup” table t2
by group
…
OK, this seems to work:
julia> merge(table.(rows(select(t2, Not(:group))))...)
Table with 4 rows, 2 columns:
x2 y2
─────────────
2 0.154278
3 0.31287
5 0.0977201
6 0.0372307
But alas, this works only if there were just two groups, cause merge works on just two tables.
OK, with foldl
this actually works for any number of groups:
julia> t = table((group = [1,1,2,2,3,3], x = [1,2,3,4,5,6]))
Table with 6 rows, 2 columns:
group x
────────
1 1
1 2
2 3
2 4
3 5
3 6
julia> t2 = groupby(t, :group, usekey = true) do k, r
(x2 = r.x .+ k.group, y2 = rand(length(r)))
end
Table with 3 rows, 3 columns:
group x2 y2
────────────────────────────────────
1 [2, 3] [0.0581229, 0.019784]
2 [5, 6] [0.63852, 0.855233]
3 [8, 9] [0.023636, 0.783005]
julia> foldl(merge, table.(rows(select(t2, Not(:group)))))
Table with 6 rows, 2 columns:
x2 y2
─────────────
2 0.0581229
3 0.019784
5 0.63852
6 0.855233
8 0.023636
9 0.783005
Or let your groupby
produce only a single column which is a Tuple
of values:
t = table((group = [1,1,2,2], x = [1,2,3,4]))
t2 = groupby(t, :group, usekey = true, flatten = true) do k, r
(z = [(r.x[i] + k.group, rand()) for i in 1:length(r)])
end
yields:
Table with 4 rows, 3 columns:
1 2 3
──────────────
1 2 0.798469
1 3 0.301879
2 5 0.883682
2 6 0.965717
As proposed above you can return an object that iterates NamedTuples
. The easiest is to use Columns
, so this:
t = table((group = [1,1,2,2], x = [1,2,3,4]))
t2 = groupby(t, :group, usekey = true) do k, r
(x2 = r.x .+ k.group, y2 = rand(length(r)))
end
Would be replaced by
t = table((group = [1,1,2,2], x = [1,2,3,4]))
t2 = groupby(t, :group, usekey = true, flatten = true) do k, r
Columns(x2 = r.x .+ k.group, y2 = rand(length(r)))
end
That is the correct solution (for me). Awesome, thanks @piever!
BTW, why not rows
, that iterates named tuples as well, no?
They return the same object: rows((x=col1, y=col2))
would also work. Either case actually returns a StructVector
(a vector of structs stored as a struct of vectors). When porting IndexedTables to StructArrays, we kept the Columns
name for backward compatibility (it is now an alias for StructVector). OTOH it is a bit confusing that
Columns(t::NamedTuple)is the same as
rows(t::NamedTuple)`, not sure what’s the correct way forward in terms of API.
1 Like