Delete Id from dataframe if variable of id contains particular value

hi i want create a new dataframe by deleting some ids from the original data frame on condition.
data frame i have

df = DataFrame()
df.id = repeat(1:3,3)
df.v1 = 1:9
df.v2 = 20:28
sort!(df,:id)

i want to delete total rows of id , if id have a value of more than 8 in column v1
data frame i want is

df2 = DataFrame() 
df2.id = repeat(1:2,3)
df2.v1 = [1,4,7,2,5,8]
df2.v2 = [20,23,26,21,24,27]
sort!(df2,:id)

can some body help me to solve this
Thanks

I’m not sure I understand the question, it seems you are missing a row in df2 (shouldn’t the row where v1 equals 3 be included?)

Anyway two ways:

julia> df[df.v1 .<= 8, :]
8Γ—3 DataFrame
 Row β”‚ id     v1     v2    
     β”‚ Int64  Int64  Int64 
─────┼─────────────────────
   1 β”‚     1      1     20
   2 β”‚     1      4     23
   3 β”‚     1      7     26
   4 β”‚     2      2     21
   5 β”‚     2      5     24
   6 β”‚     2      8     27
   7 β”‚     3      3     22
   8 β”‚     3      6     25

julia> filter!(:v1 => <=(8), df)
8Γ—3 DataFrame
 Row β”‚ id     v1     v2    
     β”‚ Int64  Int64  Int64 
─────┼─────────────────────
   1 β”‚     1      1     20
   2 β”‚     1      4     23
   3 β”‚     1      7     26
   4 β”‚     2      2     21
   5 β”‚     2      5     24
   6 β”‚     2      8     27
   7 β”‚     3      3     22
   8 β”‚     3      6     25

the second changes df in place.

Have you worked through

https://github.com/bkamins/Julia-DataFrames-Tutorial

It seems like the type of questions you are asking are well covered by this, e.g. all the row subsetting stuff is here:

https://github.com/bkamins/Julia-DataFrames-Tutorial/blob/master/06_rows.ipynb

(which also discusses subset as an additional option)

1 Like

hi @nilshg , my id 3 has one value in v1 which is more than 8, so i want to deleted whole id 3 from the dataset

Thanks , i will go through this

Ah that’s the bit I didn’t understand, you want something like

julia> df[transform(groupby(df, :id), :v1 => maximum).v1_maximum .< 9, :]
6Γ—2 DataFrame
 Row β”‚ id     v1    
     β”‚ Int64  Int64 
─────┼──────────────
   1 β”‚     1      1
   2 β”‚     2      2
   3 β”‚     1      4
   4 β”‚     2      5
   5 β”‚     1      7
   6 β”‚     2      8
2 Likes

FWIW, another one-liner option:

df[df.id .βˆ‰ (df.id[findall(>(8), df.v1)],), :]
1 Like

@rafael.guerra Thank you

1 Like