Another fun thing I was doing is filtering rows based on the properties of their groups and I ran into the following error:
julia> ex = DataFrame(:a=>[1,2,3,4,5,6,7,8], :b=>repeat([:a, :b, :c, :d], inner=(2)))
8×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Symbol │
├─────┼───────┼────────┤
│ 1 │ 1 │ a │
│ 2 │ 2 │ a │
│ 3 │ 3 │ b │
│ 4 │ 4 │ b │
│ 5 │ 5 │ c │
│ 6 │ 6 │ c │
│ 7 │ 7 │ d │
│ 8 │ 8 │ d │
julia> ex |>
@groupby(_.b) |>
@map({rows=_, avg=mean(_.a)})|>
@filter(_.avg > 2) |>
@mapmany(_.rows, {__...}) |>
DataFrame
ERROR: ArgumentError: unable to construct DataFrame from QueryOperators.EnumerableMapMany{Tuple{Int64,Vararg{Union{Int64, Symbol},N} where N},QueryOperators.EnumerableIterable{NamedTuple{(:rows, :avg),Tuple{Grouping{Symbol,NamedTuple{(:a, :b),Tuple{Int64,Symbol}}},Float64}},QueryOperators.EnumerableFilter{NamedTuple{(:rows, :avg),Tuple{Grouping{Symbol,NamedTuple{(:a, :b),Tuple{Int64,Symbol}}},Float64}},QueryOperators.EnumerableIterable{NamedTuple{(:rows, :avg),Tuple{Grouping{Symbol,NamedTuple{(:a, :b),Tuple{Int64,Symbol}}},Float64}},QueryOperators.EnumerableMap{NamedTuple{(:rows, :avg),Tuple{Grouping{Symbol,NamedTuple{(:a, :b),Tuple{Int64,Symbol}}},Float64}},QueryOperators.EnumerableIterable{Grouping{Symbol,NamedTuple{(:a, :b),Tuple{Int64,Symbol}}},QueryOperators.EnumerableGroupBy{Grouping{Symbol,NamedTuple{(:a, :b),Tuple{Int64,Symbol}}},Symbol,NamedTuple{(:a, :b),Tuple{Int64,Symbol}},QueryOperators.EnumerableIterable{NamedTuple{(:a, :b),Tuple{Int64,Symbol}},Tables.DataValueRowIterator{NamedTuple{(:a, :b),Tuple{Int64,Symbol}},Tables.RowIterator{NamedTuple{(:a, :b),Tuple{Array{Int64,1},Array{Symbol,1}}}}}},getfield(Main, Symbol("##52#62")),getfield(Main, Symbol("##53#63"))}},getfield(Main, Symbol("##55#65"))}},getfield(Main, Symbol("##57#67"))}},getfield(Main, Symbol("##59#69")),getfield(Main, Symbol("##60#70"))}
Stacktrace:
[1] DataFrame(::QueryOperators.EnumerableMapMany{Tuple{Int64,Vararg{Union{Int64, Symbol},N} where N},QueryOperators.EnumerableIterable{NamedTuple{(:rows, :avg),Tuple{Grouping{Symbol,NamedTuple{(:a, :b),Tuple{Int64,Symbol}}},Float64}},QueryOperators.EnumerableFilter{NamedTuple{(:rows, :avg),Tuple{Grouping{Symbol,NamedTuple{(:a, :b),Tuple{Int64,Symbol}}},Float64}},QueryOperators.EnumerableIterable{NamedTuple{(:rows, :avg),Tuple{Grouping{Symbol,NamedTuple{(:a, :b),Tuple{Int64,Symbol}}},Float64}},QueryOperators.EnumerableMap{NamedTuple{(:rows, :avg),Tuple{Grouping{Symbol,NamedTuple{(:a, :b),Tuple{Int64,Symbol}}},Float64}},QueryOperators.EnumerableIterable{Grouping{Symbol,NamedTup
This only seems to happen when I splat and don’t include any new columns because if I include a new column (like in your example) it works great:
julia> ex |>
@groupby(_.b) |>
@map({rows=_, avg=mean(_.a)})|>
@filter(_.avg > 2) |>
@mapmany(_.rows, {__..., _.avg}) |>
DataFrame
6×3 DataFrame
│ Row │ a │ b │ avg │
│ │ Int64 │ Symbol │ Float64 │
├─────┼───────┼────────┼─────────┤
│ 1 │ 3 │ b │ 3.5 │
│ 2 │ 4 │ b │ 3.5 │
│ 3 │ 5 │ c │ 5.5 │
│ 4 │ 6 │ c │ 5.5 │
│ 5 │ 7 │ d │ 7.5 │
│ 6 │ 8 │ d │ 7.5 │