How to use `in` with @where [DataFramesMeta]?



As from the title, how to use in with @where (DataFramesMeta)?

e.g. @where(df, :x in ["cat1", "cat2"])

I did already try :x .in ["cat1", "cat2"] or .in(:x, ["cat1", "cat2"])


You actually need your function to take a vector and output a vector of booleans, as @where works with functions taking vectors as inputs. For example:

df = DataFrame(x = 1:3, y = [2, 1, 2])
@where(df, :x .> 1)

One easy solution is to use a version of in where the first argument can be a vector:

@where(df, [x in [1,2] for x in :x])

which of course gets annoying to write so I guess you could define some auxiliary function if you have to do this very often. For example you could define a customized operator \smallin :

as ∊ b = [a in b for a in as]
@where(df, :x ∊ [1,2])

On the other hand I’d be curious to know what is the recommended way to select data in a DataFrame. I’m not sure whether it is the @where macro from DataFramesMeta or if instead the Query.jl package should be preferred (or some other option that I’m not aware of).


FWIW, there’s an issue about it here.


You can use Query for this:

@from i in df begin
    @where i.x in ["cat1", "cat2"]
    @select i
    @collect DataFrame

I’m still working on a short-version API for this kind of scenario (the above code really is quite verbose if all you want to do is filter…), but nothing really working at this point.


I use indexin for this kind of query, i.e.

julia> d = DataFrame(x=[1, 2, 3], y="X")
julia> @where(d, indexin(:x, [1, 3]) .> 0)
2×2 DataFrames.DataFrame
│ Row │ x │ y │
│ 1   │ 1 │ X │
│ 2   │ 3 │ X │

As a bonus, it is tolerant of NA values, too :slight_smile:

@where(df, in.(:x, [["cat1", "cat2"]]))

Notice that ["cat1", "cat2"] is wrapped in another Array (the iterable is the only element of another iterable).