hello,
I have two different dataframes with different columns and headers, but the columns Start
and Start_site
are related and in particular I want to select the rows of the first dataframe with Start
matching the second dataframe with Start_site
.
Essentially I am looking for the julia equivalent of Rβs Start
%in% Start_site
.
I tried with join, but it requires the same dataframe structureβ¦
Thanks
Nevermind, I found it:
join(df1, df2; on = :Start, kind = :semi, makeunique = false,
indicator = nothing, validate = (false, false))
I had to call the columns with the same name
There is also the in
function:
start = [1, 3, 4, 6]
start_site = 1:5
in_start_site = in(start_site) # in(...) returns a function
in_start_site.(start) # [true, true, true, false]
in(collection)
returns a function that tests if something is in collection
. You can also write this in one line:
in(start).(start_site)
cf
TL;DR: you are probably looking for
x .β Ref(Set(y))
Hello, I got a slight variation of this problem. Instead of testing arrays, I would like to select the rows of a dataframe whose column X corresponds to the all the elements of an array. In other words:
if I have an array x=["a", "b", "c"]
and a dataframe df with unique(df[:X]) = "a", "c", "d", "g"
, can I make a selection? df[df[:X] .== x, :]
, df[df[:X] .β x, :]
and df[occursin.(x, df.X), :]
did not workβ¦
df[in(x).(df.X), :]
what would be the negation of this? that is how to select the rows of the dataframe whose field X IS NOT in x?
julia> x=["a","b","c"]
julia> df = DataFrame(X = ["a","c","d","g"])
julia> df[.!in(x).(df.X),:]
2Γ1 DataFrame
β Row β X β
β β String β
βββββββΌβββββββββ€
β 1 β d β
β 2 β g β
or:
join(df1, df2, on=:X, kind=:anti)
?join
filter(df) do row
row.X in x
end