Using nonunique() with multiple dataframe columns

kbot · July 12, 2021, 5:49pm

Hi,

I’d like to identify rows with duplicate values across multiple columns (i.e. colA && colC && colD) with something like

findall( nonunique( df, CONDITION) )

for one column CONDITION is easy e.g. :colA.

How can I do this for multiple, non-contiguous columns?

Thanks for any help,

pdeffebach · July 12, 2021, 5:53pm

Give a vector of column names

julia> df = DataFrame(a = [1, 2, 1], b = [4, 5, 4])
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     1      4

julia> nonunique(df, :a)
3-element Vector{Bool}:
 0
 0
 1

julia> nonunique(df, [:a, :b])
3-element Vector{Bool}:
 0
 0
 1

kbot · July 13, 2021, 7:54am

Thanks, simple!

Topic		Replies	Views
If I have a DataFrame and want to get `unique` columns but treat each column as a General Usage	1	868	February 23, 2021
Filtering dataframe for unique rows with respect one of column New to Julia question , dataframes	1	52	July 18, 2024
Delete duplicate rows in a DataFrame New to Julia dataframes	10	6104	June 22, 2023
Find unique row in DataFrame General Usage	5	1649	May 17, 2018
Query.@join with repeated names Data query	1	713	April 24, 2019

Using nonunique() with multiple dataframe columns

Related topics