Filter rows after a perticular value in column

sai_matcha · March 1, 2022, 6:42am

Hi , I have a dafataframe that looks like

df = DataFrame()
df.id = sort!(repeat(1:3,5))
df.a = repeat([0,5,7,15,17],3)
df.diff_a = repeat([0,5,2,8,2],3)

I want to drop all coming rows whenever ‘df.diff_a’ value goes more than 6 .
Data Frame i want is

df = DataFrame()
df.id = sort!(repeat(1:3,3))
df.a = repeat([0,5,7],3)
df.diff_a = repeat([0,5,2],3)

I have dropped all the next rows whenever diff_a crossed value 6.

Hope i am clear. Thanks

bkamins · March 1, 2022, 7:35am

Could you please clarify. In the requested output you are also dropping rows when :diff_a is less or equal than 6.

If you want to drop rows greater than 6 do:

subset(df, :diff_a => ByRow(<=(6)))

or using DataFramesMeta.jl (which should be probably easier for you to use):

@rsubset(df, :diff_a <= 6)

sai_matcha · March 1, 2022, 9:28am

There are 5 rows in my each id . id1 contains [ 0,5,2,8,2] in the df.diff_a. if any value occurs above 6 in any row of df.diff_a , i want to delete all the rows of that ID after that value.
Here value 8 is occuring at 4th row. so i want to delete all rows after 3rd row .

in simple words, if any number greater than 6 occurs in any rows, that row and rows after that row should be filtered or dropped. even if next rows are less than 6.

Thanks

nilshg · March 1, 2022, 9:59am

I would probably do it this way which is quite transparent imho, although there might be more efficient solutions:

First work out which row has the first value >6 by id, as well as the observation number of each row within id:

julia> transform!(groupby(df, :id), :diff_a => (x -> findfirst(>(6), x)) => :cutoff, :id => (x -> 1:length(x)) => :n_obs)
15×5 DataFrame
 Row │ id     a      diff_a  cutoff  n_obs 
     │ Int64  Int64  Int64   Int64   Int64 
─────┼─────────────────────────────────────
   1 │     1      0       0       4      1
   2 │     1      5       5       4      2
   3 │     1      7       2       4      3
   4 │     1     15       8       4      4
   5 │     1     17       2       4      5
   6 │     2      0       0       4      1
   7 │     2      5       5       4      2
   8 │     2      7       2       4      3
   9 │     2     15       8       4      4
  10 │     2     17       2       4      5
  11 │     3      0       0       4      1
  12 │     3      5       5       4      2
  13 │     3      7       2       4      3
  14 │     3     15       8       4      4
  15 │     3     17       2       4      5

Now drop rows where the observation number is equal to or larger than the number of the first >6 observation:

julia> df[df.n_obs .< df.cutoff, :]
9×5 DataFrame
 Row │ id     a      diff_a  cutoff  n_obs 
     │ Int64  Int64  Int64   Int64   Int64 
─────┼─────────────────────────────────────
   1 │     1      0       0       4      1
   2 │     1      5       5       4      2
   3 │     1      7       2       4      3
   4 │     2      0       0       4      1
   5 │     2      5       5       4      2
   6 │     2      7       2       4      3
   7 │     3      0       0       4      1
   8 │     3      5       5       4      2
   9 │     3      7       2       4      3

You might have to adapt this for groups where there is no value larger than 6, in which case findfirst will return nothing.

sai_matcha · March 1, 2022, 10:21am

The code helped me to get solution. But im not completely understood how it is working. Thank you very much .

sai_matcha · April 26, 2022, 2:55am

Hi, How to handle when returns nothing ?

Topic		Replies	Views
Delete Id from dataframe if variable of id contains particular value General Usage dataframes	6	515	February 10, 2022
How to delete range of values before and after a value in a column based on a condition from another column in a DataFrame? New to Julia dataframes	1	324	August 31, 2022
Delete missing values after the last non missing value in each id New to Julia dataframes	7	551	September 1, 2022
How do I drop only rows that are fully filled with missing values? General Usage question , package , dataframes	3	175	January 24, 2023
Dataframes: How to conditionally remove rows based on data values? General Usage	4	8165	October 1, 2020

Filter rows after a perticular value in column

Related topics