What is Julia's equivalent of R's %in%

hello,
I have two different dataframes with different columns and headers, but the columns Start and Start_site are related and in particular I want to select the rows of the first dataframe with Start matching the second dataframe with Start_site.
Essentially I am looking for the julia equivalent of R’s Start %in% Start_site.
I tried with join, but it requires the same dataframe structure…
Thanks

Nevermind, I found it:

join(df1, df2; on = :Start, kind = :semi, makeunique = false,
                   indicator = nothing, validate = (false, false))

I had to call the columns with the same name

There is also the in function:

start = [1, 3, 4, 6]
start_site = 1:5 
in_start_site = in(start_site) # in(...) returns a function
in_start_site.(start) #  [true, true, true, false]

in(collection) returns a function that tests if something is in collection. You can also write this in one line:

in(start).(start_site)

cf

TL;DR: you are probably looking for

x .∈ Ref(Set(y))
2 Likes

Hello, I got a slight variation of this problem. Instead of testing arrays, I would like to select the rows of a dataframe whose column X corresponds to the all the elements of an array. In other words:
if I have an array x=["a", "b", "c"] and a dataframe df with unique(df[:X]) = "a", "c", "d", "g", can I make a selection? df[df[:X] .== x, :], df[df[:X] .∈ x, :] and df[occursin.(x, df.X), :] did not work…

df[in(x).(df.X), :]

1 Like

what would be the negation of this? that is how to select the rows of the dataframe whose field X IS NOT in x?

julia> x=["a","b","c"]
julia> df = DataFrame(X = ["a","c","d","g"])

julia> df[.!in(x).(df.X),:]

2Γ—1 DataFrame
β”‚ Row β”‚ X      β”‚
β”‚     β”‚ String β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ d      β”‚
β”‚ 2   β”‚ g      β”‚


2 Likes

or:

join(df1, df2, on=:X, kind=:anti)

?join
1 Like
filter(df) do row
    row.X in x
end