As from the title, how to use in
with @where
(DataFramesMeta)?
e.g. @where(df, :x in ["cat1", "cat2"])
I did already try :x .in ["cat1", "cat2"]
or .in(:x, ["cat1", "cat2"])
β¦
As from the title, how to use in
with @where
(DataFramesMeta)?
e.g. @where(df, :x in ["cat1", "cat2"])
I did already try :x .in ["cat1", "cat2"]
or .in(:x, ["cat1", "cat2"])
β¦
You actually need your function to take a vector and output a vector of booleans, as @where
works with functions taking vectors as inputs. For example:
df = DataFrame(x = 1:3, y = [2, 1, 2])
@where(df, :x .> 1)
One easy solution is to use a version of in
where the first argument can be a vector:
@where(df, [x in [1,2] for x in :x])
which of course gets annoying to write so I guess you could define some auxiliary function if you have to do this very often. For example you could define a customized operator \smallin
:
as β b = [a in b for a in as]
@where(df, :x β [1,2])
On the other hand Iβd be curious to know what is the recommended way to select data in a DataFrame. Iβm not sure whether it is the @where
macro from DataFramesMeta or if instead the Query.jl package should be preferred (or some other option that Iβm not aware of).
FWIW, thereβs an issue about it here.
You can use Query for this:
@from i in df begin
@where i.x in ["cat1", "cat2"]
@select i
@collect DataFrame
end
Iβm still working on a short-version API for this kind of scenario (the above code really is quite verbose if all you want to do is filterβ¦), but nothing really working at this point.
I use indexin
for this kind of query, i.e.
julia> d = DataFrame(x=[1, 2, 3], y="X")
julia> @where(d, indexin(:x, [1, 3]) .> 0)
2Γ2 DataFrames.DataFrame
β Row β x β y β
β-----βΌ---βΌ---β€
β 1 β 1 β X β
β 2 β 3 β X β
As a bonus, it is tolerant of NA
values, too
@where(df, in.(:x, [["cat1", "cat2"]]))
Notice that ["cat1", "cat2"]
is wrapped in another Array
(the iterable is the only element of another iterable).
I really like this thank you.
Can also be written
@where(df, :x .β [["cat1", "cat2"]])