Pandas's fast isin() equivalent in Julia DataFrame or IndexedTable or anything else

nalimilan · October 19, 2019, 10:51am

df[in.(df.x, [Set(df.y)]), :] is already much faster (BTW, df[in.(df.x, Ref(Set(df.y)), :] is more idiomatic). There’s also the more convoluted df[.!isnothing.(indexin(df.x, df.y)), :].

I think there have been discussions about making in.(...) use more efficient algorithms by default, but it’s hard to know in advance which approach is faster (e.g. x or y could be very short).

Topic		Replies	Views
DataFrame isin operation New to Julia dataframes	4	1109	December 9, 2021
Fast in.(x, Ref(y)) Performance	4	520	May 18, 2020
[DataFrames Question]: hash-based row indexing for DataFrames package Data question , suggestions	16	2242	October 16, 2019
In function runs really slow Data dataframes	3	482	March 29, 2022
Can this be made faster? Performance dataframes	5	550	March 19, 2022

Pandas's fast isin() equivalent in Julia DataFrame or IndexedTable or anything else

Related topics