Filter DataFrame by an Array


#1

Hi,
I want to sub-select based on specific values from a Dataframe, like in R would be done by
valuesX %in% listY

temp_df = DataFrame(IndexVal = 1:10, Names = [“A”,“B”,“C”,“A1”,“B1”,“C1”,“A2”,“B2”,“C2”,“A3”])

temp_df[[temp_df[:IndexVal] in [1,3,5]],:]

Here I want to select only the values where IndexVal is 1,3 or 5.

How do I do this?

Thank you in advance


#2

Sounds like you need my favorite function that’s not in base.

function vectorin(a, b)
    bset = Set(b)
    [i in bset for i in a]
end
temp_df[vectorin(temp_df[:IndexVal], [1,3,5]),:]

#3

I think a more idiomatic way of doing this in Julia would be to use filter:

filter(row -> row[:IndexVal] in [1,3,5], temp_df)

But in order to get in working as you want, you need to use . to vectorize it and use a little “trick”:

temp_df[in.(temp_df[:IndexVal], ([1,3,5],)), :]

The trick there is to have the second array as an (only) element of some other iterable (I used a 1-element tuple but it can also be an array of arrays - [[1,3,5]]).
You can read more about vectorizing/broadcasting from the Functions section of the manual.

Yet another way is to use findin to get the desired indices:

temp_df[findin(temp_df[:IndexVal], [1,3,5]), :]

#4

Thank you both - very clear and concise