hi i want create a new dataframe by deleting some ids from the original data frame on condition.
data frame i have
df = DataFrame()
df.id = repeat(1:3,3)
df.v1 = 1:9
df.v2 = 20:28
sort!(df,:id)
i want to delete total rows of id , if id have a value of more than 8 in column v1
data frame i want is
df2 = DataFrame()
df2.id = repeat(1:2,3)
df2.v1 = [1,4,7,2,5,8]
df2.v2 = [20,23,26,21,24,27]
sort!(df2,:id)
can some body help me to solve this
Thanks
nilshg
2
Iβm not sure I understand the question, it seems you are missing a row in df2
(shouldnβt the row where v1
equals 3 be included?)
Anyway two ways:
julia> df[df.v1 .<= 8, :]
8Γ3 DataFrame
Row β id v1 v2
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 1 20
2 β 1 4 23
3 β 1 7 26
4 β 2 2 21
5 β 2 5 24
6 β 2 8 27
7 β 3 3 22
8 β 3 6 25
julia> filter!(:v1 => <=(8), df)
8Γ3 DataFrame
Row β id v1 v2
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 1 20
2 β 1 4 23
3 β 1 7 26
4 β 2 2 21
5 β 2 5 24
6 β 2 8 27
7 β 3 3 22
8 β 3 6 25
the second changes df
in place.
Have you worked through
https://github.com/bkamins/Julia-DataFrames-Tutorial
It seems like the type of questions you are asking are well covered by this, e.g. all the row subsetting stuff is here:
https://github.com/bkamins/Julia-DataFrames-Tutorial/blob/master/06_rows.ipynb
(which also discusses subset
as an additional option)
1 Like
hi @nilshg , my id 3 has one value in v1 which is more than 8, so i want to deleted whole id 3 from the dataset
Thanks , i will go through this
nilshg
5
Ah thatβs the bit I didnβt understand, you want something like
julia> df[transform(groupby(df, :id), :v1 => maximum).v1_maximum .< 9, :]
6Γ2 DataFrame
Row β id v1
β Int64 Int64
ββββββΌββββββββββββββ
1 β 1 1
2 β 2 2
3 β 1 4
4 β 2 5
5 β 1 7
6 β 2 8
2 Likes
FWIW, another one-liner option:
df[df.id .β (df.id[findall(>(8), df.v1)],), :]
1 Like