As from the title, how to use `in`

with `@where`

(DataFramesMeta)?

e.g. `@where(df, :x in ["cat1", "cat2"])`

I did already try `:x .in ["cat1", "cat2"]`

or `.in(:x, ["cat1", "cat2"])`

β¦

As from the title, how to use `in`

with `@where`

(DataFramesMeta)?

e.g. `@where(df, :x in ["cat1", "cat2"])`

I did already try `:x .in ["cat1", "cat2"]`

or `.in(:x, ["cat1", "cat2"])`

β¦

1 Like

You actually need your function to take a vector and output a vector of booleans, as `@where`

works with functions taking vectors as inputs. For example:

```
df = DataFrame(x = 1:3, y = [2, 1, 2])
@where(df, :x .> 1)
```

One easy solution is to use a version of `in`

where the first argument can be a vector:

`@where(df, [x in [1,2] for x in :x])`

which of course gets annoying to write so I guess you could define some auxiliary function if you have to do this very often. For example you could define a customized operator `\smallin`

:

```
as β b = [a in b for a in as]
@where(df, :x β [1,2])
```

On the other hand Iβd be curious to know what is the recommended way to select data in a DataFrame. Iβm not sure whether it is the `@where`

macro from DataFramesMeta or if instead the Query.jl package should be preferred (or some other option that Iβm not aware of).

1 Like

You can use Query for this:

```
@from i in df begin
@where i.x in ["cat1", "cat2"]
@select i
@collect DataFrame
end
```

Iβm still working on a short-version API for this kind of scenario (the above code really is quite verbose if all you want to do is filterβ¦), but nothing really working at this point.

3 Likes

I use `indexin`

for this kind of query, i.e.

```
julia> d = DataFrame(x=[1, 2, 3], y="X")
julia> @where(d, indexin(:x, [1, 3]) .> 0)
2Γ2 DataFrames.DataFrame
β Row β x β y β
β-----βΌ---βΌ---β€
β 1 β 1 β X β
β 2 β 3 β X β
```

As a bonus, it is tolerant of `NA`

values, too

```
@where(df, in.(:x, [["cat1", "cat2"]]))
```

Notice that `["cat1", "cat2"]`

is wrapped in another `Array`

(the iterable is the only element of another iterable).

2 Likes

I really like this thank you.

Can also be written

`@where(df, :x .β [["cat1", "cat2"]])`