mike_k
1
Dear folks,
I fail to remove rows in a dataframe based on conditions. Here is an MWE:
The dataframe is
│ Row │ A │ B │
│ │ String │ String │
├─────┼────────┼────────┤
│ 1 │ X │ C │
│ 2 │ X │ C │
│ 3 │ Y │ C │
│ 4 │ Y │ D │
and I want to delete the last row with a logical AND condition. I tried
using DataFrames, Query, CSV
df = DataFrame(A = ["X", "X", "Y", "Y"], B = ["C", "C", "C", "D"])
df = df |> @filter((_.A != "Y") .& (_.B != "D") ) |> DataFrame
print(df)
which, however, removes all rows in which “Y” appears , i.e.,
│ Row │ A │ B │
│ │ String │ String │
├─────┼────────┼────────┤
│ 1 │ X │ C │
│ 2 │ X │ C │
Thank you in advance for some inputs.
nilshg
2
Not sure about Query but note that you can always subset easily in plain DataFrames:
df[(df.A .!= "Y") .& (df.B .!= "D"), :]
is pretty readable imho.
You might be using filter
the wrong way around, cf this thread: The filter function is non-intuitive - #10 by ianfiske (although the thread is about the filter
function, not the Query macro)
3 Likes
DataFrames has a filter
function build in. But note that as mentioned above, filter
keeps rows, rather than removes them.
julia> df = DataFrame(A = ["X", "X", "Y", "Y"], B = ["C", "C", "C", "D"])
4×2 DataFrame
│ Row │ A │ B │
│ │ String │ String │
├─────┼────────┼────────┤
│ 1 │ X │ C │
│ 2 │ X │ C │
│ 3 │ Y │ C │
│ 4 │ Y │ D │
julia> filter(row -> !(row.A == "Y" && row.B == "D"), df)
3×2 DataFrame
│ Row │ A │ B │
│ │ String │ String │
├─────┼────────┼────────┤
│ 1 │ X │ C │
│ 2 │ X │ C │
│ 3 │ Y │ C │
4 Likes
mike_k
4
Great, many thanks to both of you !