Mo-Gul
1
How can I drop all elements in a list that occur more than once? For example when I have
using DataFrames
firstnames = DataFrame(name = ["Anton", "Jordan", "Jordan"], gender = ["M", "M", "F"])
I would like the remaining dataframe to only have the row with the name “Anton”. Unfortunately I can’t use
unique(firstnames, :name)
because it returns the first two rows.
jar1
2
Yeah, Base.unique
returns distinct values rather than unique ones.
julia> filter(x->x.nrow ==1, combine(groupby(firstnames, :name), nrow, :gender))
1×3 DataFrame
Row │ name nrow gender
│ String Int64 String
─────┼───────────────────────
1 │ Anton 1 M
nilshg
3
One way:
julia> x = filter(:nrow => ==(1), combine(groupby(firstnames, :name), nrow))
1×2 DataFrame
Row │ name nrow
│ String Int64
─────┼───────────────
1 │ Anton 1
and then subset firstnames
with this
julia> firstnames[in(x.name).(firstnames.name), :]
1×2 DataFrame
Row │ name gender
│ String String
─────┼────────────────
1 │ Anton M
Mo-Gul
4
Awesome. Many thanks for the answers. Both are very similar but as a beginner I can follow jar1’s answer a bit more easily.
Knowing that solution I replace filter
with subset
so I can do
using DataFrames
using Chain
firstnames = DataFrame(name = ["Anton", "Jordan", "Jordan"], gender = ["M", "M", "F"]);
@chain firstnames begin
groupby(:name)
combine(:gender, nrow => :nrow)
subset!(:nrow => ByRow(==(1)))
select!(Not(:nrow))
end
also resulting in
1×2 DataFrame
Row │ name gender
│ String String
─────┼────────────────
1 │ Anton M