Delete Id from dataframe if variable of id contains particular value

sai_matcha · February 10, 2022, 11:29am

hi i want create a new dataframe by deleting some ids from the original data frame on condition.
data frame i have

df = DataFrame()
df.id = repeat(1:3,3)
df.v1 = 1:9
df.v2 = 20:28
sort!(df,:id)

i want to delete total rows of id , if id have a value of more than 8 in column v1
data frame i want is

df2 = DataFrame() 
df2.id = repeat(1:2,3)
df2.v1 = [1,4,7,2,5,8]
df2.v2 = [20,23,26,21,24,27]
sort!(df2,:id)

can some body help me to solve this
Thanks

nilshg · February 10, 2022, 12:12pm

I’m not sure I understand the question, it seems you are missing a row in df2 (shouldn’t the row where v1 equals 3 be included?)

Anyway two ways:

julia> df[df.v1 .<= 8, :]
8×3 DataFrame
 Row │ id     v1     v2    
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1     20
   2 │     1      4     23
   3 │     1      7     26
   4 │     2      2     21
   5 │     2      5     24
   6 │     2      8     27
   7 │     3      3     22
   8 │     3      6     25

julia> filter!(:v1 => <=(8), df)
8×3 DataFrame
 Row │ id     v1     v2    
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1     20
   2 │     1      4     23
   3 │     1      7     26
   4 │     2      2     21
   5 │     2      5     24
   6 │     2      8     27
   7 │     3      3     22
   8 │     3      6     25

the second changes df in place.

Have you worked through

https://github.com/bkamins/Julia-DataFrames-Tutorial

It seems like the type of questions you are asking are well covered by this, e.g. all the row subsetting stuff is here:

https://github.com/bkamins/Julia-DataFrames-Tutorial/blob/master/06_rows.ipynb

(which also discusses subset as an additional option)

sai_matcha · February 10, 2022, 12:42pm

hi @nilshg , my id 3 has one value in v1 which is more than 8, so i want to deleted whole id 3 from the dataset

sai_matcha · February 10, 2022, 12:43pm

Thanks , i will go through this

nilshg · February 10, 2022, 1:50pm

Ah that’s the bit I didn’t understand, you want something like

julia> df[transform(groupby(df, :id), :v1 => maximum).v1_maximum .< 9, :]
6×2 DataFrame
 Row │ id     v1    
     │ Int64  Int64 
─────┼──────────────
   1 │     1      1
   2 │     2      2
   3 │     1      4
   4 │     2      5
   5 │     1      7
   6 │     2      8

rafael.guerra · February 10, 2022, 3:13pm

FWIW, another one-liner option:

df[df.id .∉ (df.id[findall(>(8), df.v1)],), :]

sai_matcha · February 10, 2022, 4:11pm

@rafael.guerra Thank you

Topic		Replies	Views
Filter rows after a perticular value in column New to Julia dataframes	5	1121	April 26, 2022
Dataframes: How to conditionally remove rows based on data values? General Usage	4	8222	October 1, 2020
Delete all rows contained in a dataframe, as specified by an array of ids New to Julia	3	326	March 10, 2021
How to filter single subject from dataframe New to Julia dataframes	8	574	April 24, 2021
How to delete rows in DataFrame? New to Julia question , dataframes	4	3691	September 12, 2022

Delete Id from dataframe if variable of id contains particular value

Related topics