As from the title, how to use in
with @where
(DataFramesMeta)?
e.g. @where(df, :x in ["cat1", "cat2"])
I did already try :x .in ["cat1", "cat2"]
or .in(:x, ["cat1", "cat2"])
β¦
As from the title, how to use in
with @where
(DataFramesMeta)?
e.g. @where(df, :x in ["cat1", "cat2"])
I did already try :x .in ["cat1", "cat2"]
or .in(:x, ["cat1", "cat2"])
β¦
You actually need your function to take a vector and output a vector of booleans, as @where
works with functions taking vectors as inputs. For example:
df = DataFrame(x = 1:3, y = [2, 1, 2])
@where(df, :x .> 1)
One easy solution is to use a version of in
where the first argument can be a vector:
@where(df, [x in [1,2] for x in :x])
which of course gets annoying to write so I guess you could define some auxiliary function if you have to do this very often. For example you could define a customized operator \smallin
:
as β b = [a in b for a in as]
@where(df, :x β [1,2])
On the other hand Iβd be curious to know what is the recommended way to select data in a DataFrame. Iβm not sure whether it is the @where
macro from DataFramesMeta or if instead the Query.jl package should be preferred (or some other option that Iβm not aware of).
You can use Query for this:
@from i in df begin
@where i.x in ["cat1", "cat2"]
@select i
@collect DataFrame
end
Iβm still working on a short-version API for this kind of scenario (the above code really is quite verbose if all you want to do is filterβ¦), but nothing really working at this point.
I use indexin
for this kind of query, i.e.
julia> d = DataFrame(x=[1, 2, 3], y="X")
julia> @where(d, indexin(:x, [1, 3]) .> 0)
2Γ2 DataFrames.DataFrame
β Row β x β y β
β-----βΌ---βΌ---β€
β 1 β 1 β X β
β 2 β 3 β X β
As a bonus, it is tolerant of NA
values, too
@where(df, in.(:x, [["cat1", "cat2"]]))
Notice that ["cat1", "cat2"]
is wrapped in another Array
(the iterable is the only element of another iterable).
I really like this thank you.
Can also be written
@where(df, :x .β [["cat1", "cat2"]])