In function runs really slow

I got 2 DataFrames tran and articles and I want to filter those rows in tran that tran.article_id in articles.article_id, the function I use is:

trans = transactions[in.(transactions.article_id, Ref(articles.article_id)),:]

in python I will use

trans = trans[trans["article_id"].isin(articles["article_id"])].reset_index(drop=True)

In python, it took 12.7s to finish but in Julia it would take several minutes, is there any way to speed up the process

PS: maybe I should not ask the question here as it is the buid-in function that make the slicing so slow.

Hard to say for certain without an MWE, but you should probably construct a set to look up the values you want to check against:

transactions[in(Set(articles.article_id)).(transactions.article_id), :]
3 Likes

Thank! I am just trying and it runs pretty fast. The articles.article_id is unique so I did not use Set at first, but I couldn’t figure out why using set could make it so much faster

There is a tradeoff. If you are doing one in then not converting vector to Set is faster. If you are doing many lookups then it is better to do the conversion.

Because it cannot be known upfront what your use case is you need to be explicit and create Set if you expect to perform many lookups.

4 Likes