Filter DataFrame by an Array


I want to sub-select based on specific values from a Dataframe, like in R would be done by
valuesX %in% listY

temp_df = DataFrame(IndexVal = 1:10, Names = [“A”,“B”,“C”,“A1”,“B1”,“C1”,“A2”,“B2”,“C2”,“A3”])

temp_df[[temp_df[:IndexVal] in [1,3,5]],:]

Here I want to select only the values where IndexVal is 1,3 or 5.

How do I do this?

Thank you in advance


Sounds like you need my favorite function that’s not in base.

function vectorin(a, b)
    bset = Set(b)
    [i in bset for i in a]
temp_df[vectorin(temp_df[:IndexVal], [1,3,5]),:]


I think a more idiomatic way of doing this in Julia would be to use filter:

filter(row -> row[:IndexVal] in [1,3,5], temp_df)

But in order to get in working as you want, you need to use . to vectorize it and use a little “trick”:

temp_df[in.(temp_df[:IndexVal], ([1,3,5],)), :]

The trick there is to have the second array as an (only) element of some other iterable (I used a 1-element tuple but it can also be an array of arrays - [[1,3,5]]).
You can read more about vectorizing/broadcasting from the Functions section of the manual.

Yet another way is to use findin to get the desired indices:

temp_df[findin(temp_df[:IndexVal], [1,3,5]), :]


Thank you both - very clear and concise