Remove all entries that occur more than once

Mo-Gul · February 18, 2022, 6:48am

How can I drop all elements in a list that occur more than once? For example when I have

using DataFrames
firstnames = DataFrame(name = ["Anton", "Jordan", "Jordan"], gender = ["M", "M", "F"])

I would like the remaining dataframe to only have the row with the name “Anton”. Unfortunately I can’t use

unique(firstnames, :name)

because it returns the first two rows.

jar1 · February 18, 2022, 7:33am

Yeah, Base.unique returns distinct values rather than unique ones.

julia> filter(x->x.nrow ==1, combine(groupby(firstnames, :name), nrow, :gender))
1×3 DataFrame
 Row │ name    nrow   gender 
     │ String  Int64  String 
─────┼───────────────────────
   1 │ Anton       1  M

nilshg · February 18, 2022, 7:34am

One way:

julia> x = filter(:nrow => ==(1), combine(groupby(firstnames, :name), nrow))
1×2 DataFrame
 Row │ name    nrow  
     │ String  Int64 
─────┼───────────────
   1 │ Anton       1

and then subset firstnames with this

julia> firstnames[in(x.name).(firstnames.name), :]
1×2 DataFrame
 Row │ name    gender 
     │ String  String 
─────┼────────────────
   1 │ Anton   M

Mo-Gul · February 18, 2022, 6:03pm

Awesome. Many thanks for the answers. Both are very similar but as a beginner I can follow jar1’s answer a bit more easily.

Knowing that solution I replace filter with subset so I can do

using DataFrames
using Chain

firstnames = DataFrame(name = ["Anton", "Jordan", "Jordan"], gender = ["M", "M", "F"]);
@chain firstnames begin
    groupby(:name)
    combine(:gender, nrow => :nrow)
    subset!(:nrow => ByRow(==(1)))
    select!(Not(:nrow))
end

also resulting in

1×2 DataFrame
 Row │ name    gender 
     │ String  String 
─────┼────────────────
   1 │ Anton   M

Topic		Replies	Views
Delete duplicate rows in a DataFrame New to Julia dataframes	10	5844	June 22, 2023
DataFrames unique - keep last occurence Data dataframes	6	1896	April 21, 2021
Filtering dataframe for unique rows with respect one of column New to Julia question , dataframes	1	42	July 18, 2024
Return duplicate rows in array with no of times and index of first occurence General Usage question , array	6	657	July 13, 2022
Delete all rows contained in a dataframe, as specified by an array of ids New to Julia	3	321	March 10, 2021

Remove all entries that occur more than once

Related topics