`MethodError` when trying to `filter` a `GroupedDataframe`

I’m trying to filter the SubDataFrames with less than a given length (nrow) out of a GroupedDataFrame, but filter() doesn’t seem to recognize the anonymus function i’m passing to it and throws a MethodError:

ERROR: LoadError: MethodError: no method matching filter!(::var"#2#3", ::GroupedDataFrame{DataFrame})
The function `filter!` exists, but no method is defined for this combination of argument types.

Here’s a MWE:

using DataFrames;

sections = [
    1, 1, 1, 1,
    2, 2, 2, 2, 2, 2,
    3, 3, 3, 3, 3,
    4, 4]
df = DataFrame(A=sections, B=0)
grouped_df = groupby(df, :A)

filter!(sub_df->(nrow(sub_df) <= 5), grouped_df)

filter is disfavored in recent DataFrames versions. Use subset, instead.

julia> subset!(grouped_df, :A => ByRow(x -> x <= 5))
17×2 DataFrame
 Row │ A      B     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      0
   2 │     1      0
   3 │     1      0
   4 │     1      0
   5 │     2      0
   6 │     2      0
  ⋮  │   ⋮      ⋮
  13 │     3      0
  14 │     3      0
  15 │     3      0
  16 │     4      0
  17 │     4      0
      6 rows omitted

This code seems to filter by row, but I’m trying to filter out entire sub dataframes
How could I do that?

Sorry. Misapprehended the MWE.


julia> # Filter to keep only groups with 5 or fewer rows
       filtered_gdf = filter(subdf -> nrow(subdf) <= 5, grouped_df)
GroupedDataFrame with 3 groups based on key: A
First Group (4 rows): A = 1
 Row │ A      B     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      0
   2 │     1      0
   3 │     1      0
   4 │     1      0
⋮
Last Group (2 rows): A = 4
 Row │ A      B     
     │ Int64  Int64 
─────┼──────────────
   1 │     4      0
   2 │     4      0

julia> # Or if you want a regular DataFrame back
       filtered_df = filter(subdf -> nrow(subdf) <= 5, grouped_df, ungroup=true)
11×2 DataFrame
 Row │ A      B     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      0
   2 │     1      0
   3 │     1      0
   4 │     1      0
   5 │     3      0
   6 │     3      0
   7 │     3      0
   8 │     3      0
   9 │     3      0
  10 │     4      0
  11 │     4      0

Wait, that’s literally the same i tried at first, except it doesn’t use the mutating version…
I just tested it and yeah, filter!() throws an error while filter() works perfectly fine
Do you know why could that be?

I guess that’s down to the nature of the object being mutated, which is a GroupedDataFrame and not its constituent SubDataFrames, so we’ll have to assign back.


julia> grouped_df = [g for g in grouped_df if nrow(g) <= 5]
3-element Vector{SubDataFrame{DataFrame, DataFrames.Index, Vector{Int64}}}:
 4×2 SubDataFrame
 Row │ A      B     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      0
   2 │     1      0
   3 │     1      0
   4 │     1      0
 5×2 SubDataFrame
 Row │ A      B     
     │ Int64  Int64 
─────┼──────────────
   1 │     3      0
   2 │     3      0
   3 │     3      0
   4 │     3      0
   5 │     3      0
 2×2 SubDataFrame
 Row │ A      B     
     │ Int64  Int64 
─────┼──────────────
   1 │     4      0
   2 │     4      0

1 Like