Filter rows after a perticular value in column

Hi , I have a dafataframe that looks like

df = DataFrame()
df.id = sort!(repeat(1:3,5))
df.a = repeat([0,5,7,15,17],3)
df.diff_a = repeat([0,5,2,8,2],3)

I want to drop all coming rows whenever ‘df.diff_a’ value goes more than 6 .
Data Frame i want is

df = DataFrame()
df.id = sort!(repeat(1:3,3))
df.a = repeat([0,5,7],3)
df.diff_a = repeat([0,5,2],3)

I have dropped all the next rows whenever diff_a crossed value 6.

Hope i am clear. Thanks

Could you please clarify. In the requested output you are also dropping rows when :diff_a is less or equal than 6.

If you want to drop rows greater than 6 do:

subset(df, :diff_a => ByRow(<=(6)))

or using DataFramesMeta.jl (which should be probably easier for you to use):

@rsubset(df, :diff_a <= 6)

There are 5 rows in my each id . id1 contains [ 0,5,2,8,2] in the df.diff_a. if any value occurs above 6 in any row of df.diff_a , i want to delete all the rows of that ID after that value.
Here value 8 is occuring at 4th row. so i want to delete all rows after 3rd row .

in simple words, if any number greater than 6 occurs in any rows, that row and rows after that row should be filtered or dropped. even if next rows are less than 6.

Thanks

I would probably do it this way which is quite transparent imho, although there might be more efficient solutions:

First work out which row has the first value >6 by id, as well as the observation number of each row within id:

julia> transform!(groupby(df, :id), :diff_a => (x -> findfirst(>(6), x)) => :cutoff, :id => (x -> 1:length(x)) => :n_obs)
15×5 DataFrame
 Row │ id     a      diff_a  cutoff  n_obs 
     │ Int64  Int64  Int64   Int64   Int64 
─────┼─────────────────────────────────────
   1 │     1      0       0       4      1
   2 │     1      5       5       4      2
   3 │     1      7       2       4      3
   4 │     1     15       8       4      4
   5 │     1     17       2       4      5
   6 │     2      0       0       4      1
   7 │     2      5       5       4      2
   8 │     2      7       2       4      3
   9 │     2     15       8       4      4
  10 │     2     17       2       4      5
  11 │     3      0       0       4      1
  12 │     3      5       5       4      2
  13 │     3      7       2       4      3
  14 │     3     15       8       4      4
  15 │     3     17       2       4      5

Now drop rows where the observation number is equal to or larger than the number of the first >6 observation:

julia> df[df.n_obs .< df.cutoff, :]
9×5 DataFrame
 Row │ id     a      diff_a  cutoff  n_obs 
     │ Int64  Int64  Int64   Int64   Int64 
─────┼─────────────────────────────────────
   1 │     1      0       0       4      1
   2 │     1      5       5       4      2
   3 │     1      7       2       4      3
   4 │     2      0       0       4      1
   5 │     2      5       5       4      2
   6 │     2      7       2       4      3
   7 │     3      0       0       4      1
   8 │     3      5       5       4      2
   9 │     3      7       2       4      3

You might have to adapt this for groups where there is no value larger than 6, in which case findfirst will return nothing.

The code helped me to get solution. But im not completely understood how it is working. Thank you very much .

Hi, How to handle when returns nothing ?