How to subset dataframe by multiple rows with same value?

Hi
I want to subset a dataframe(df) by multiple rows with value (10004).
I tried the command, but it does not work, I am wondering why ?

df[in([10004]).(df.HWG), :]

28,960 rows × 11 columns (omitted printing of 2 columns)

HWG X X.1 X.2 X.3 X.4 X.5 Herd.born.in GenotypedDate
String Int64 Int64 Int64 Int64 Int64 Int64 String String
1 10004 0 410240170 0 0 2 -9 10004 14.12.2018
2 10004 0 412396339 412301476 409446588 2 -9 10004 01.12.2017
3 10004 0 412442014 0 0 2 -9 10004 14.12.2018
4 10004 0 412409256 0 0 2 -9 10004 14.12.2018
5 10004 0 412419664 0 0 2 -9 10004 14.12.2018
6 10004 0 412442177 0 0 2 -9 10004 14.12.2018
7 10004 0 412556442 0 0 2 -9 10004 14.12.2018
8 10004 0 412502732 0 0 2 -9 10004 14.12.2018
9 10004 0 412490450 0 0 2 -9 10004 14.12.2018
10 10004 0 412788155 0 0 2 -9 10004 20.12.2018
11 10004 0 410103042 0 0 2 -9 10004 14.12.2018
12 10004 0 412532939 0 0 2 -9 10004 14.12.2018
13 10004 0 411394744 0 0 2 -9 10004 14.12.2018
14 10005 0 411203394 412301753 409435638 2 -9 10005 24.08.2018
15 10005 0 410101642 412300972 410144275 2 -9 10005 24.08.2018
16 10005 0 410786267 412301772 411910808 2 -9 10005 24.08.2018
17 10005 0 412551400 412301966 409336495 2 -9 10005 24.08.2018
18 10005 0 408990326 412301218 409435638 2 -9 10005 24.08.2018
19 10005 0 412532262 412301953 409951758 2 -9 10005 24.08.2018
20 10005 0 412524592 412301946 412012310 2 -9 10005 24.08.2018
21 10005 0 408766913 412301860 410744756 2 -9 10630 24.08.2018
22 10005 0 412547148 412301494 408742355 2 -9 10005 24.08.2018
23 10005 0 412524860 412301964 411028826 2 -9 10005 24.08.2018
24 10005 0 412482552 412301949 408884862 2 -9 10005 24.08.2018

There is a type mismatch. HWG is a String, [10004] is a an Array of Int64, therefore the resulting BitArray is all false. in([“10004”]) will give the correct answer.

To test against a single value, df[df.hwg .== "10004", :] should work.
To test against a set of values, df.[df.hwg .∈ (["10004", "10005"],), :] should work.

2 Likes

thanks you for helping,

the command for single value works, but second one got error

syntax: invalid syntax “df.[(df.hwg .∈ ([“10004”, “10005”],), :]” around In[86]:1

Looks like you have an extra bracket after the opening square bracket?

2 Likes

I run this command

df.[df.hwg .∈ ([“10004”, “10005”],), :]

and gives this error, how can I fix it ?

syntax: invalid syntax “df.[(df.HWG .∈ ([“10004”, “10005”],)), :]” around In[59]:1

Ah sorry I missed the second error in your command - you also have a dot before the opening brackets, it should be df[ instead of df.[

1 Like

thank you , now It works :slight_smile: