Hi , I have a dafataframe that looks like
df = DataFrame()
df.id = sort!(repeat(1:3,5))
df.a = repeat([0,5,7,15,17],3)
df.diff_a = repeat([0,5,2,8,2],3)
I want to drop all coming rows whenever ‘df.diff_a’ value goes more than 6 .
Data Frame i want is
df = DataFrame()
df.id = sort!(repeat(1:3,3))
df.a = repeat([0,5,7],3)
df.diff_a = repeat([0,5,2],3)
I have dropped all the next rows whenever diff_a crossed value 6.
Hope i am clear. Thanks
Could you please clarify. In the requested output you are also dropping rows when :diff_a
is less or equal than 6
.
If you want to drop rows greater than 6 do:
subset(df, :diff_a => ByRow(<=(6)))
or using DataFramesMeta.jl (which should be probably easier for you to use):
@rsubset(df, :diff_a <= 6)
1 Like
There are 5 rows in my each id . id1 contains [ 0,5,2,8,2] in the df.diff_a. if any value occurs above 6 in any row of df.diff_a , i want to delete all the rows of that ID after that value.
Here value 8 is occuring at 4th row. so i want to delete all rows after 3rd row .
in simple words, if any number greater than 6 occurs in any rows, that row and rows after that row should be filtered or dropped. even if next rows are less than 6.
Thanks
nilshg
March 1, 2022, 9:59am
4
I would probably do it this way which is quite transparent imho, although there might be more efficient solutions:
First work out which row has the first value >6 by id, as well as the observation number of each row within id:
julia> transform!(groupby(df, :id), :diff_a => (x -> findfirst(>(6), x)) => :cutoff, :id => (x -> 1:length(x)) => :n_obs)
15×5 DataFrame
Row │ id a diff_a cutoff n_obs
│ Int64 Int64 Int64 Int64 Int64
─────┼─────────────────────────────────────
1 │ 1 0 0 4 1
2 │ 1 5 5 4 2
3 │ 1 7 2 4 3
4 │ 1 15 8 4 4
5 │ 1 17 2 4 5
6 │ 2 0 0 4 1
7 │ 2 5 5 4 2
8 │ 2 7 2 4 3
9 │ 2 15 8 4 4
10 │ 2 17 2 4 5
11 │ 3 0 0 4 1
12 │ 3 5 5 4 2
13 │ 3 7 2 4 3
14 │ 3 15 8 4 4
15 │ 3 17 2 4 5
Now drop rows where the observation number is equal to or larger than the number of the first >6 observation:
julia> df[df.n_obs .< df.cutoff, :]
9×5 DataFrame
Row │ id a diff_a cutoff n_obs
│ Int64 Int64 Int64 Int64 Int64
─────┼─────────────────────────────────────
1 │ 1 0 0 4 1
2 │ 1 5 5 4 2
3 │ 1 7 2 4 3
4 │ 2 0 0 4 1
5 │ 2 5 5 4 2
6 │ 2 7 2 4 3
7 │ 3 0 0 4 1
8 │ 3 5 5 4 2
9 │ 3 7 2 4 3
You might have to adapt this for groups where there is no value larger than 6, in which case findfirst
will return nothing
.
1 Like
The code helped me to get solution. But im not completely understood how it is working. Thank you very much .
Hi, How to handle when returns nothing ?