Delete Id from dataframe if variable of id contains particular value

hi i want create a new dataframe by deleting some ids from the original data frame on condition.
data frame i have

df = DataFrame()
df.id = repeat(1:3,3)
df.v1 = 1:9
df.v2 = 20:28
sort!(df,:id)

i want to delete total rows of id , if id have a value of more than 8 in column v1
data frame i want is

df2 = DataFrame() 
df2.id = repeat(1:2,3)
df2.v1 = [1,4,7,2,5,8]
df2.v2 = [20,23,26,21,24,27]
sort!(df2,:id)

can some body help me to solve this
Thanks

I’m not sure I understand the question, it seems you are missing a row in df2 (shouldn’t the row where v1 equals 3 be included?)

Anyway two ways:

julia> df[df.v1 .<= 8, :]
8Γ—3 DataFrame
 Row β”‚ id     v1     v2    
     β”‚ Int64  Int64  Int64 
─────┼─────────────────────
   1 β”‚     1      1     20
   2 β”‚     1      4     23
   3 β”‚     1      7     26
   4 β”‚     2      2     21
   5 β”‚     2      5     24
   6 β”‚     2      8     27
   7 β”‚     3      3     22
   8 β”‚     3      6     25

julia> filter!(:v1 => <=(8), df)
8Γ—3 DataFrame
 Row β”‚ id     v1     v2    
     β”‚ Int64  Int64  Int64 
─────┼─────────────────────
   1 β”‚     1      1     20
   2 β”‚     1      4     23
   3 β”‚     1      7     26
   4 β”‚     2      2     21
   5 β”‚     2      5     24
   6 β”‚     2      8     27
   7 β”‚     3      3     22
   8 β”‚     3      6     25

the second changes df in place.

Have you worked through

It seems like the type of questions you are asking are well covered by this, e.g. all the row subsetting stuff is here:

(which also discusses subset as an additional option)

1 Like

hi @nilshg , my id 3 has one value in v1 which is more than 8, so i want to deleted whole id 3 from the dataset

Thanks , i will go through this

Ah that’s the bit I didn’t understand, you want something like

julia> df[transform(groupby(df, :id), :v1 => maximum).v1_maximum .< 9, :]
6Γ—2 DataFrame
 Row β”‚ id     v1    
     β”‚ Int64  Int64 
─────┼──────────────
   1 β”‚     1      1
   2 β”‚     2      2
   3 β”‚     1      4
   4 β”‚     2      5
   5 β”‚     1      7
   6 β”‚     2      8
2 Likes

FWIW, another one-liner option:

df[df.id .βˆ‰ (df.id[findall(>(8), df.v1)],), :]
1 Like

@rafael.guerra Thank you

1 Like