Filter rows after a perticular value in column

Hi , I have a dafataframe that looks like

df = DataFrame()
df.id = sort!(repeat(1:3,5))
df.a = repeat([0,5,7,15,17],3)
df.diff_a = repeat([0,5,2,8,2],3)

I want to drop all coming rows whenever ‘df.diff_a’ value goes more than 6 .
Data Frame i want is

df = DataFrame()
df.id = sort!(repeat(1:3,3))
df.a = repeat([0,5,7],3)
df.diff_a = repeat([0,5,2],3)

I have dropped all the next rows whenever diff_a crossed value 6.

Hope i am clear. Thanks

Could you please clarify. In the requested output you are also dropping rows when :diff_a is less or equal than 6.

If you want to drop rows greater than 6 do:

subset(df, :diff_a => ByRow(<=(6)))

or using DataFramesMeta.jl (which should be probably easier for you to use):

@rsubset(df, :diff_a <= 6)
1 Like

There are 5 rows in my each id . id1 contains [ 0,5,2,8,2] in the df.diff_a. if any value occurs above 6 in any row of df.diff_a , i want to delete all the rows of that ID after that value.
Here value 8 is occuring at 4th row. so i want to delete all rows after 3rd row .

in simple words, if any number greater than 6 occurs in any rows, that row and rows after that row should be filtered or dropped. even if next rows are less than 6.

Thanks

I would probably do it this way which is quite transparent imho, although there might be more efficient solutions:

First work out which row has the first value >6 by id, as well as the observation number of each row within id:

julia> transform!(groupby(df, :id), :diff_a => (x -> findfirst(>(6), x)) => :cutoff, :id => (x -> 1:length(x)) => :n_obs)
15×5 DataFrame
 Row │ id     a      diff_a  cutoff  n_obs 
     │ Int64  Int64  Int64   Int64   Int64 
─────┼─────────────────────────────────────
   1 │     1      0       0       4      1
   2 │     1      5       5       4      2
   3 │     1      7       2       4      3
   4 │     1     15       8       4      4
   5 │     1     17       2       4      5
   6 │     2      0       0       4      1
   7 │     2      5       5       4      2
   8 │     2      7       2       4      3
   9 │     2     15       8       4      4
  10 │     2     17       2       4      5
  11 │     3      0       0       4      1
  12 │     3      5       5       4      2
  13 │     3      7       2       4      3
  14 │     3     15       8       4      4
  15 │     3     17       2       4      5

Now drop rows where the observation number is equal to or larger than the number of the first >6 observation:

julia> df[df.n_obs .< df.cutoff, :]
9×5 DataFrame
 Row │ id     a      diff_a  cutoff  n_obs 
     │ Int64  Int64  Int64   Int64   Int64 
─────┼─────────────────────────────────────
   1 │     1      0       0       4      1
   2 │     1      5       5       4      2
   3 │     1      7       2       4      3
   4 │     2      0       0       4      1
   5 │     2      5       5       4      2
   6 │     2      7       2       4      3
   7 │     3      0       0       4      1
   8 │     3      5       5       4      2
   9 │     3      7       2       4      3

You might have to adapt this for groups where there is no value larger than 6, in which case findfirst will return nothing.

1 Like

The code helped me to get solution. But im not completely understood how it is working. Thank you very much .

Hi, How to handle when returns nothing ?