I would like to find the subset of X where only elements of the column X2 that match the values contained in Y7 i.e.
│ X1 │ X2 │ X3 │ X4 │ X5 │
-----------------------------------
│ 1 │ "a" │ "a" │ 1 │ "a" │
│ 2 │ "b" │ "b" │ 2 │ "b" │
│ 1 │ "a" │ "d" │ 4 │ "a" │
Also, I would like to apply a second filter on that subset, by filtering those elements of X1 that are equal to a certain value (value contained in Y2), in this case 1.
Are your X and YArrays or DataFrames? Your question isn’t clear to me.
This works:
using DataFrames
# create dummy data
x = DataFrame(:x1 => rand(1:10, 100), :x2 => rand(1:10, 100))
y = DataFrame(:y1 => rand(1:10, 100), :y2 => rand(1:10, 100))
# Select rows where the values in the x1 column of x are the same as the y1
# column of y, and the x2 value is 7 (arbitrary value for illustration)
x[(x.x1 .== y.y1) .& (x.x2 .== 7), :]
This however implicitly assumes that it is meaningful to compare the values of a column in x to the values in the same row of a column in y. In that case it would probably be more natural to simply have x and y as a single DataFrame with all columns in x and y:
My suggestion still works in principle, however upon reading your edited post it appears that your initial criterion is not that the value of X.X2 should not be the same as the value in the same row in Y.Y7, but rather you want all rows for which the value in X.X2 is contained anywhere in Y.Y7, correct?