Pandas's fast isin() equivalent in Julia DataFrame or IndexedTable or anything else

df[in.(df.x, [Set(df.y)]), :] is already much faster (BTW, df[in.(df.x, Ref(Set(df.y)), :] is more idiomatic). There’s also the more convoluted df[.!isnothing.(indexin(df.x, df.y)), :].

I think there have been discussions about making in.(...) use more efficient algorithms by default, but it’s hard to know in advance which approach is faster (e.g. x or y could be very short).

8 Likes