Hi , I have a dafataframe that looks like
df = DataFrame()
df.id = sort!(repeat(1:3,5))
df.a = repeat([0,5,7,15,17],3)
df.diff_a = repeat([0,5,2,8,2],3)
I want to drop all coming rows whenever ‘df.diff_a’ value goes more than 6 .
Data Frame i want is
df = DataFrame()
df.id = sort!(repeat(1:3,3))
df.a = repeat([0,5,7],3)
df.diff_a = repeat([0,5,2],3)
I have dropped all the next rows whenever diff_a crossed value 6.
Hope i am clear. Thanks
Could you please clarify. In the requested output you are also dropping rows when :diff_a is less or equal than 6.
If you want to drop rows greater than 6 do:
subset(df, :diff_a => ByRow(<=(6)))
or using DataFramesMeta.jl (which should be probably easier for you to use):
@rsubset(df, :diff_a <= 6)
There are 5 rows in my each id . id1 contains [ 0,5,2,8,2] in the df.diff_a. if any value occurs above 6 in any row of df.diff_a , i want to delete all the rows of that ID after that value.
Here value 8 is occuring at 4th row. so i want to delete all rows after 3rd row .
in simple words, if any number greater than 6 occurs in any rows, that row and rows after that row should be filtered or dropped. even if next rows are less than 6.
Thanks
nilshg
March 1, 2022, 9:59am
4
I would probably do it this way which is quite transparent imho, although there might be more efficient solutions:
First work out which row has the first value >6 by id, as well as the observation number of each row within id:
julia> transform!(groupby(df, :id), :diff_a => (x -> findfirst(>(6), x)) => :cutoff, :id => (x -> 1:length(x)) => :n_obs)
15×5 DataFrame
Row │ id a diff_a cutoff n_obs
│ Int64 Int64 Int64 Int64 Int64
─────┼─────────────────────────────────────
1 │ 1 0 0 4 1
2 │ 1 5 5 4 2
3 │ 1 7 2 4 3
4 │ 1 15 8 4 4
5 │ 1 17 2 4 5
6 │ 2 0 0 4 1
7 │ 2 5 5 4 2
8 │ 2 7 2 4 3
9 │ 2 15 8 4 4
10 │ 2 17 2 4 5
11 │ 3 0 0 4 1
12 │ 3 5 5 4 2
13 │ 3 7 2 4 3
14 │ 3 15 8 4 4
15 │ 3 17 2 4 5
Now drop rows where the observation number is equal to or larger than the number of the first >6 observation:
julia> df[df.n_obs .< df.cutoff, :]
9×5 DataFrame
Row │ id a diff_a cutoff n_obs
│ Int64 Int64 Int64 Int64 Int64
─────┼─────────────────────────────────────
1 │ 1 0 0 4 1
2 │ 1 5 5 4 2
3 │ 1 7 2 4 3
4 │ 2 0 0 4 1
5 │ 2 5 5 4 2
6 │ 2 7 2 4 3
7 │ 3 0 0 4 1
8 │ 3 5 5 4 2
9 │ 3 7 2 4 3
You might have to adapt this for groups where there is no value larger than 6, in which case findfirst will return nothing.
The code helped me to get solution. But im not completely understood how it is working. Thank you very much .
Hi, How to handle when returns nothing ?