df[in.(df.x, [Set(df.y)]), :]
is already much faster (BTW, df[in.(df.x, Ref(Set(df.y)), :]
is more idiomatic). There’s also the more convoluted df[.!isnothing.(indexin(df.x, df.y)), :]
.
I think there have been discussions about making in.(...)
use more efficient algorithms by default, but it’s hard to know in advance which approach is faster (e.g. x
or y
could be very short).