Filter doesn't work on grouped dataframe

The problem is simple, I have a grouped data frame and I want to throw out groups. filter however, does not work. This seems like a very common idiom, so I’m thinking I’m missing some very simple way to handle it.

p.s. i meant within the DataFrame environment. clearly i can create my own filter function and just push!(a, g) to collect the groups i want.

Thanks,

julia> using DataFrames
x
julia> x=DataFrame([[1,1,2,2,3,3],collect(1:6)],[:a,:b])
6×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 1     │
│ 2   │ 1     │ 2     │
│ 3   │ 2     │ 3     │
│ 4   │ 2     │ 4     │
│ 5   │ 3     │ 5     │
│ 6   │ 3     │ 6     │

julia> g=groupby(x, :a)
GroupedDataFrame with 3 groups based on key: a
First Group (2 rows): a = 1
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 1     │
│ 2   │ 1     │ 2     │
⋮
Last Group (2 rows): a = 3
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 3     │ 5     │
│ 2   │ 3     │ 6     │

julia> filter(y->y[1,:a]==3,g)
ERROR: MethodError: no method matching filter(::getfield(Main, Symbol("##3#4")), ::GroupedDataFrame{DataFrame})
Closest candidates are:
  filter(::Any, ::Array{T,1} where T) at array.jl:2351
  filter(::Any, ::BitArray) at bitarray.jl:1710
  filter(::Any, ::AbstractArray) at array.jl:2312
  ...
Stacktrace:
 [1] top-level scope at none:0

Encountered the same problem, and hope someone could help.
From what I found, neither DataFrames.jl nor Query.jl provide a solution to filter grouped data based on the grouping variable. Are there other solutions around? Thanks.

I’ve found filter(y->y[1,:a]==3,collect(g)) works, but there maybe a better way.

1 Like

An alternative is to use Query.jl. For example, df |> @groupby(_.a) |> @filter(key(_) .== 3).

This will work in the next release of DataFrames.

In the meantime you can use combine

combine(gd) do sdf
    first(sdf.a) == 1 ? DataFrame() : sdf
end
3 Likes

Posting this in 2022 because this is where my search led. I got a satisfying result using the docstring from help?> groupby.

I’m in DataFrames 1.2 today. You can index into grouped dataframe keys a few ways, for example:

last(keys(gd))
# GroupKey: (a = 3,)

gd[(a = 3,)]
# 2×5 SubDataFrame
#│ Row │ a     │ b     │
#│     │ Int64 │ Int64 │
#├─────┼───────┼───────┤
#│ 1   │ 3     │ 5     │
#│ 2   │ 3     │ 6     │