Hi all-
I have two dataframes in which one is a subset of the other based on the values in some columns. For example,
using DataFrames
df1 = DataFrame(a=[1,1,2,2,3,3], b=[1,2,1,2,1,2], c=rand(6))
df2 = DataFrame(a=[1,1,2,2], b=[1,2,1,2], c=rand(4))
df2 is a subset of df1 based on shared values in columns a and b:
julia> df1
6×3 DataFrame
│ Row │ a │ b │ c │
│ │ Int64 │ Int64 │ Float64 │
├─────┼───────┼───────┼──────────┤
│ 1 │ 1 │ 1 │ 0.448961 │
│ 2 │ 1 │ 2 │ 0.858127 │
│ 3 │ 2 │ 1 │ 0.97272 │
│ 4 │ 2 │ 2 │ 0.655589 │
│ 5 │ 3 │ 1 │ 0.723655 │
│ 6 │ 3 │ 2 │ 0.426203 │
julia> df2
4×3 DataFrame
│ Row │ a │ b │ c │
│ │ Int64 │ Int64 │ Float64 │
├─────┼───────┼───────┼────────────┤
│ 1 │ 1 │ 1 │ 0.498987 │
│ 2 │ 1 │ 2 │ 0.813332 │
│ 3 │ 2 │ 1 │ 0.566679 │
│ 4 │ 2 │ 2 │ 0.00879591 │
What I would like to do is extract the subset of df1 that intersects with d2 on columns a and b, yielding the following new dataframe:
julia> df3
4×3 DataFrame
│ Row │ a │ b │ c │
│ │ Int64 │ Int64 │ Float64 │
├─────┼───────┼───────┼──────────┤
│ 1 │ 1 │ 1 │ 0.448961 │
│ 2 │ 1 │ 2 │ 0.858127 │
│ 3 │ 2 │ 1 │ 0.97272 │
│ 4 │ 2 │ 2 │ 0.655589 │
Is there a general way to achieve this result when given two dataframes and a vector of columns, e.g. extract(df1, df2, [:a, :b])