Dataframes: How to conditionally remove rows based on data values?

Dear folks,
I fail to remove rows in a dataframe based on conditions. Here is an MWE:

The dataframe is

│ Row │ A      │ B      │
│     │ String │ String │
├─────┼────────┼────────┤
│ 1   │ X      │ C      │
│ 2   │ X      │ C      │
│ 3   │ Y      │ C      │
│ 4   │ Y      │ D      │

and I want to delete the last row with a logical AND condition. I tried

using DataFrames, Query, CSV

df = DataFrame(A = ["X", "X", "Y", "Y"], B = ["C", "C", "C", "D"])
df = df |> @filter((_.A != "Y") .& (_.B != "D") ) |> DataFrame
print(df)

which, however, removes all rows in which “Y” appears :frowning: , i.e.,

│ Row │ A      │ B      │
│     │ String │ String │
├─────┼────────┼────────┤
│ 1   │ X      │ C      │
│ 2   │ X      │ C      │

Thank you in advance for some inputs.

Not sure about Query but note that you can always subset easily in plain DataFrames:

df[(df.A .!= "Y") .& (df.B .!= "D"), :]

is pretty readable imho.

You might be using filter the wrong way around, cf this thread: The filter function is non-intuitive - #10 by ianfiske (although the thread is about the filter function, not the Query macro)

3 Likes

DataFrames has a filter function build in. But note that as mentioned above, filter keeps rows, rather than removes them.

julia> df = DataFrame(A = ["X", "X", "Y", "Y"], B = ["C", "C", "C", "D"])
4×2 DataFrame
│ Row │ A      │ B      │
│     │ String │ String │
├─────┼────────┼────────┤
│ 1   │ X      │ C      │
│ 2   │ X      │ C      │
│ 3   │ Y      │ C      │
│ 4   │ Y      │ D      │

julia> filter(row -> !(row.A == "Y" && row.B == "D"),  df)
3×2 DataFrame
│ Row │ A      │ B      │
│     │ String │ String │
├─────┼────────┼────────┤
│ 1   │ X      │ C      │
│ 2   │ X      │ C      │
│ 3   │ Y      │ C      │
4 Likes

Great, many thanks to both of you :slight_smile:!

Raise an issue

https://github.com/JuliaData/DataFrames.jl/issues/2460