I need to subset a dataframe by groups. that is, in each group there is a different filtering condition to apply. here is a MWE
using DataFrames, DataFramesMeta
julia> df = DataFrame(A = repeat(1:3,3), B = rand(1:9,9),C = rand(9))
9×3 DataFrame
│ Row │ A │ B │ C │
│ │ Int64 │ Int64 │ Float64 │
├─────┼───────┼───────┼──────────┤
│ 1 │ 1 │ 1 │ 0.676259 │
│ 2 │ 2 │ 5 │ 0.675797 │
│ 3 │ 3 │ 9 │ 0.597738 │
│ 4 │ 1 │ 6 │ 0.593364 │
│ 5 │ 2 │ 2 │ 0.156473 │
│ 6 │ 3 │ 7 │ 0.859696 │
│ 7 │ 1 │ 9 │ 0.951905 │
│ 8 │ 2 │ 6 │ 0.196368 │
│ 9 │ 3 │ 1 │ 0.292369 │
Suppose for each group of A I want to discard the row where B
is largest within that group:
julia> @linq df |>
groupby(:A) |>
where( :B != maximum(:B) )
GroupedDataFrame with 3 groups based on key: A
First Group (3 rows): A = 1
│ Row │ A │ B │ C │
│ │ Int64 │ Int64 │ Float64 │
├─────┼───────┼───────┼──────────┤
│ 1 │ 1 │ 1 │ 0.676259 │
│ 2 │ 1 │ 6 │ 0.593364 │
│ 3 │ 1 │ 9 │ 0.951905 │
⋮
Last Group (3 rows): A = 3
│ Row │ A │ B │ C │
│ │ Int64 │ Int64 │ Float64 │
├─────┼───────┼───────┼──────────┤
│ 1 │ 3 │ 9 │ 0.597738 │
│ 2 │ 3 │ 7 │ 0.859696 │
│ 3 │ 3 │ 1 │ 0.292369 │
that’s not it. let me try creating a grouped dataframe and work on that:
julia> g = @linq df |>
groupby(:A)
julia> filter(x -> (x[:B] < 9),g[1])
2×3 DataFrame
│ Row │ A │ B │ C │
│ │ Int64 │ Int64 │ Float64 │
├─────┼───────┼───────┼──────────┤
│ 1 │ 1 │ 1 │ 0.676259 │
│ 2 │ 1 │ 6 │ 0.593364 │
that works! of course I cannot hardcode 9
, so i must find out what the largest value is:
julia> filter(x -> (x[:B] < maximum(x[:B])),g[1])
0×3 DataFrame
julia> filter(x -> (x[:B] .< maximum(x[:B])),g[1])
0×3 DataFrame
what’s the mistake I’m making? thanks