As from the title, how to use `in`

with `@where`

(DataFramesMeta)?

e.g. `@where(df, :x in ["cat1", "cat2"])`

I did already try `:x .in ["cat1", "cat2"]`

or `.in(:x, ["cat1", "cat2"])`

…

As from the title, how to use `in`

with `@where`

(DataFramesMeta)?

e.g. `@where(df, :x in ["cat1", "cat2"])`

I did already try `:x .in ["cat1", "cat2"]`

or `.in(:x, ["cat1", "cat2"])`

…

You actually need your function to take a vector and output a vector of booleans, as `@where`

works with functions taking vectors as inputs. For example:

```
df = DataFrame(x = 1:3, y = [2, 1, 2])
@where(df, :x .> 1)
```

One easy solution is to use a version of `in`

where the first argument can be a vector:

`@where(df, [x in [1,2] for x in :x])`

which of course gets annoying to write so I guess you could define some auxiliary function if you have to do this very often. For example you could define a customized operator `\smallin`

:

```
as ∊ b = [a in b for a in as]
@where(df, :x ∊ [1,2])
```

On the other hand I’d be curious to know what is the recommended way to select data in a DataFrame. I’m not sure whether it is the `@where`

macro from DataFramesMeta or if instead the Query.jl package should be preferred (or some other option that I’m not aware of).

You can use Query for this:

```
@from i in df begin
@where i.x in ["cat1", "cat2"]
@select i
@collect DataFrame
end
```

I’m still working on a short-version API for this kind of scenario (the above code really is quite verbose if all you want to do is filter…), but nothing really working at this point.

I use `indexin`

for this kind of query, i.e.

```
julia> d = DataFrame(x=[1, 2, 3], y="X")
julia> @where(d, indexin(:x, [1, 3]) .> 0)
2×2 DataFrames.DataFrame
│ Row │ x │ y │
├-----┼---┼---┤
│ 1 │ 1 │ X │
│ 2 │ 3 │ X │
```

As a bonus, it is tolerant of `NA`

values, too

```
@where(df, in.(:x, [["cat1", "cat2"]]))
```

Notice that `["cat1", "cat2"]`

is wrapped in another `Array`

(the iterable is the only element of another iterable).