Dataframe delete duplicate with condition

Joshua_Bowles · September 24, 2019, 11:23pm

I’m trying to get from this df

df = DataFrame(A = ["jj","jj","ab","ac","ac"], B = [0, 1, 0, 0, 1], C = ["M", "M", "F", "M", "M"])
#==
5×3 DataFrame
│ Row │ A      │ B     │ C      │
│     │ String │ Int64 │ String │
├─────┼────────┼───────┼────────┤
│ 1   │ jj     │ 0     │ M      │
│ 2   │ jj     │ 1     │ M      │
│ 3   │ ab     │ 0     │ F      │
│ 4   │ ac     │ 0     │ M      │
│ 5   │ ac     │ 1     │ M      │
==#

to this df2

df2 = DataFrame(A = ["jj","ab","ac"], B = [1, 0, 1], C = ["M", "F", "M"])
#==
3×3 DataFrame
│ Row │ A      │ B     │ C      │
│     │ String │ Int64 │ String │
├─────┼────────┼───────┼────────┤
│ 1   │ jj     │ 1     │ M      │
│ 2   │ ab     │ 0     │ F      │
│ 3   │ ac     │ 1     │ M      │
==#

Basically, I need to dedup the dataframe on column A, but on the condition that column B == 0. I thought I could loop over a grouped data frame and filter but I was not aware you could not delete from a sub dataframe (e.g., this error #ERROR: ArgumentError: SubDataFrame does not support deleting row) .

for g in groupby(df, :A)
    if size(g, 1) > 1
        filter!(row->row[:B] == 1, g)
    end
end

Mattriks · September 25, 2019, 1:30am

by(df, :A) do sbdf
    (size(sbdf, 1)>1) ? sbdf[sbdf.B.==1,:] : sbdf
end

For a large dataframe, there are potentially faster solutions, see Split-apply-combine · DataFrames.jl

Joshua_Bowles · September 25, 2019, 1:36am

nice. thank you

Topic		Replies	Views
Delete rows in DataFrame Conditionally General Usage dataframes	4	1623	February 18, 2020
Remove all entries that occur more than once New to Julia dataframes	3	425	February 18, 2022
Filtering dataframe for unique rows with respect one of column New to Julia question , dataframes	1	52	July 18, 2024
Dataframes: How to conditionally remove rows based on data values? General Usage	4	8235	October 1, 2020
Delete all rows contained in a dataframe, as specified by an array of ids New to Julia	3	327	March 10, 2021

Dataframe delete duplicate with condition

Related topics