How to select multiple items using DataFramesMeta?

question

#1
using DataFrames, DataFramesMeta
x = ["a", "b", "c", "a", "c", "d"]
df = DataFrame(x=x, y = 1:length(x))
z = ["a", "c"]

How do I query the items in z in df using DataFramesMeta?

@where(df, in(z, :x)) didn’t work.

It’s a similar to https://github.com/JuliaData/DataFramesMeta.jl/issues/77


#2
@where(df, map(c -> c in z, :x))
# or
@where(df, in.(:x, Ref{Array}(z)))

#3

thanks, can you enlighten me why you used Ref{Array} in the second example? Much appreciated.


#4

The {Array} part is not necessary. Ref(z) would work just as well.

The reason we use Ref is because when we broadcast functions across elements of an array, like we are doing above, if one of the arguments, say,x, is broadcastable, like an array, as opposed to a scalar, it tries to apply the function to each element of x. If the length of x doesn’t match the length of the other arguments, we throw an error.

This is not the case with scalars. Ref acts as a way to treat a non-scalar argument like a scalar so we can tell Julia not to try and broadcast the function across it.


#5

Awesome! Thanks :grinning:


#6

This has come up a bit lately, so I updated an issue comment here. a Scalar function has been discussed before to avoid this confusion.


#7

In Julia 1.0 there’s a curried version of in, meaning that for a vector v in(v) is a function:

julia> in([1,2])(1)
true

julia> in([1,2])(3)
false

and this can be broadcasted:

julia> in([1,2]).(1:3)
3-element BitArray{1}:
  true
  true
 false

So I imagine you could do:

@where(df, in(z).(:x))

#8

Here is the Query.jl way of doing this:

df |> @filter(_.x in z) |> DataFrame