Compound dataframe filtering with negated boolean expression

I’m trying to reproduce a compound filter but it involves negation of a two part specification.

I’d normally do something like this in R, using dplyr

filter(!(Fuel == "Gas" & Field == "Maari"))

In simple terms, I want to be able to filter my dataframe to exclude all instances where the combination of Fuel == Gas, and Field == Maari. I have been able to get a single negated boolean to work, but I can’t seem to get the negated compound expression to evaluate correctly.

I’ve tried this expression, and a few variations, but I keep getting an error saying there is no method matching !(::BitArray{1}). If I remove the ! then the expression returns the expected filtered result.

test[!((test.Fuel .== "Gas") .& (test.Field .== Symbol("Maari"))), :]

Can someone suggest how I might correctly negate the expression above?

Cheers

Jeff

I think you’re looking for .! (I asked a similar question a while ago, search for negating boolean array)

2 Likes

You’re a champion! I can’t believe I didn’t think to try that.

It is quite a change moving from R to Julia.

Thanks so much.

Jeff

Glad that worked! It is and it isn’t to me - I’d say coming from Python is easier, but most often the issues you’ll experience are method errors down to using an inappropriate type. For me in R I seem to suffer from the same problems, albeit they are harder to diagnose as the type information is not great, different things have different indexing patterns eg.
I guess coming to a new language error messages are always cryptic to some extent, but having used both R and Julia in parallel over the past few months to me there’s no comparison in what’s easier to debug

1 Like

I have a similar usecase. Let’s take:

df=DataFrame(x=[1,2,3])

I tried to use: subset(df, "x"=>x -> x.==1) in order to make a simple filter.
How can I modify the logic to get as subset a dataframe containing the values 1 and 2?
I tried subset(df, "x"=>x -> x.==[1,2]) but it gives an error. I also have to mention that the number of values I want to filter to is arbitrary, not necessarily 2 as in this example.

try with this


df=DataFrame(x=rand(1:10,10))

this=Set([1,2,3])

subset(df, :x=>ByRow(x->x∈this))
subset(df, :x=>x->x .∈ Ref(this))
1 Like