How to make compact subset queries of a dataframe?

I am looking for a more elegant way to do this

data[(data.ID .== 1) .| (data.ID .== 4) .| (data.ID .== 7) .| (data.ID .== 10),: ]

where data is a DataFrame. I simply want to extract rows where the ID value is one of the four specified. I was trying stuff like data.ID .== [1,4,7,10]. But this breaks because of the broadcasting.

One way to do it is by using Ref

julia> df = DataFrame(:ID => collect(1:10), :x => 'a':'j')
julia> df[in.(df.ID, Ref((1, 4, 7, 10))), :]

4×2 DataFrame
│ Row │ ID    │ x    │
│     │ Int64 │ Char │
├─────┼───────┼──────┤
│ 1   │ 1     │ 'a'  │
│ 2   │ 4     │ 'd'  │
│ 3   │ 7     │ 'g'  │
│ 4   │ 10    │ 'j'  │

It is easy with filter:

filter(row → row.ID in [1,4,7,10], data)

I suggest also to to read the DataFrame Tutorial, it is very informative about DataFrames.

julia> data = DataFrame(:ID=>[1, 4, 7, 10, 20])
5×1 DataFrame
│ Row │ ID    │
│     │ Int64 │
├─────┼───────┤
│ 1   │ 1     │
│ 2   │ 4     │
│ 3   │ 7     │
│ 4   │ 10    │
│ 5   │ 20    │

julia> filter(row->row.ID in [1,4,7], data)
3×1 DataFrame
│ Row │ ID    │
│     │ Int64 │
├─────┼───────┤
│ 1   │ 1     │
│ 2   │ 4     │
│ 3   │ 7     │

2 Likes

Okay thank you both, I’ll check out the tutorial

I also like the Fix2 version of in which I find more readable:

df[in([1, 4, 7, 10]).(df.ID), :]
2 Likes

Since nobody else has mentioned it, you might also find DataFramesMeta to be useful.