Subset a dataframe by column of another dataframe

I have two dataframes (A) with 6 columns and (B) with 1 column.
and B is subset of second column of A,
now I want subset the dataframe (A) based on the values of B.
how can I do it ?
here is my dataframes.

Blockquote 27,856 rows ร— 6 columns

Column1 Column2 Column3 Column4 Column5 Column6
Int64 Int64 Int64 Int64 Int64 Int64
1 0 409859435 400005850 411102034 2 -9
2 0 409922125 400005850 411657369 2 -9
3 0 411075330 400005356 407723032 2 -9
4 0 412057132 400005972 410308103 2 -9
5 0 404693736 400003797 404050484 2 -9
6 0 404880845 400004013 403616839 2 -9
7 0 405021206 400004013 404052866 2 -9
8 0 405050763 400004026 403764505 2 -9
9 0 405061972 400003797 404413052 2 -9
10 0 405081272 400003904 403792802 2 -9
11 0 407123971 400004752 406110041 2 -9
12 0 408672181 400005052 406400297 2 -9
13 0 408796084 400005390 406845171 2 -9
14 0 408961344 400005390 407647694 2 -9
15 0 408980544 400005390 407436650 2 -9
16 0 409008076 400005417 409134077 2 -9
17 0 409057560 400005390 407197047 2 -9
18 0 409124877 400006160 407355458 2 -9
19 0 409304952 400005375 407188939 2 -9
20 0 409382613 400005356 406239975 2 -9
21 0 409391608 400005939 411022194 2 -9
22 0 409425436 400005356 408688743 2 -9
23 0 409554333 400006150 409357356 2 -9
24 0 409683419 400005356 407326695 2 -9
25 0 409760842 400005464 409945894 2 -9
26 0 409808255 400005464 410317603 2 -9
27 0 409844690 400005393 407347158 2 -9
28 0 410313740 400005390 407681898 2 -9
29 0 410354645 400005183 407289871 2 -9
30 0 410394992 400005393 410606564 2 -9
โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ

27,856 rows ร— 1 columns

X.1
Int64
1 409859435
2 409922125
3 411075330
4 412057132
5 404693736
6 404880845
7 405021206
8 405050763
9 405061972
10 405081272
11 407123971
12 408672181
13 408796084
14 408961344
15 408980544
16 409008076
17 409057560
18 409124877
19 409304952
20 409382613
21 409391608
22 409425436
23 409554333
24 409683419
25 409760842
26 409808255
27 409844690
28 410313740
29 410354645
30 410394992
โ‹ฎ

You mean like

df.a[df.b .> 0]

?

I am looking for method like subsetting dataframes in R

subset(A, Column2 %in% B$X.1)

Ah I see.

Setup:

julia> df = DataFrame(a = [1, 2, 3, 4]); df2 = DataFrame(a2 = [1, 2]);

Indexing solution:


julia> df[in.(df.a, Ref(df2.a2)), :]
2ร—1 DataFrame
 Row โ”‚ a     
     โ”‚ Int64 
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     1
   2 โ”‚     2

filter solution

julia> filter(:a => in(df2.a2), df)

DataFramesMeta solution

julia> @where(df, in.(:a, Ref(df2.a2)))
2ร—1 DataFrame
 Row โ”‚ a     
     โ”‚ Int64 
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     1
   2 โ”‚     2

Note: Performance is actually better if you convert df2.a2 to a Set first, but you donโ€™t have to worry about this if things are fast enough.

Also I think this requires Julia 1.5 or above.

1 Like

Thanks so much. :star_struck: :star_struck: :star_struck: