Find unique row in DataFrame

babaq · May 17, 2018, 12:53am

Hi All,

I am finding unique rows in a DataFrame and get number of repeat of unique rows, I could get by this:

t=DataFrame(a=rand(1:5,20),b=rand([:x,:y,:z],20))
ut=unique(t)

Int[countnz([t[j,:]==ut[i,:] for j in 1:size(t,1)]) for i in 1:size(ut,1)]

but I am wondering if there are more clean way, such as

Int[countnz(t.==r) for r in t]
or
Int[countnz(t.==r) for r in eachrow(t)]

right now, none of the above works, however it came to me as a natural way to broadcast each row and compare.

any thoughts?

pdeffebach · May 17, 2018, 1:43am

You can do this with a grouped operation

# you can group with a vector of symbols, so use all names in the DataFrame
by(t, names(t)) do # do is an easy way to do an anonymous function
       DataFrame(m = length(d[:a])) # just choose any column
end

With DataFramesMeta and Lazy, which is closest to R’s chaining imo, you can do

using DataFramesMeta, Lazy
t = @> t begin
    @by(names(t), n_copies = length(:a))
end

babaq · May 17, 2018, 4:25am

@pdeffebach thanks for the tip, split-apply-combine is more elegant.

babaq · May 17, 2018, 5:44am

@pdeffebach
btw, how do I get the indices of each row in grouped DataFrame, so that I can index back to the original ungrouped DataFrame. like this:

[find([t[j,:]==ut[i,:] for j in 1:size(t,1)]) for i in 1:size(ut,1)]

pdeffebach · May 17, 2018, 1:50pm

Comparing rows across DataFrames is less straightforward than I thought, but it might be easier on master, I’m not sure.

Here’s the DataFramesMeta way:

t[:rownum] = [i for i in nrow(t)]
ut = @> t begin
     @by([:a,:b], places = [[i for i in :rownum]])
end

babaq · May 17, 2018, 9:29pm

@pdeffebach thanks, I did it this way:

cols=names(t)
t[:I] = 1:nrow(t)
by(t, cols,g->DataFrame(n=size(g,1), i=[g[:i]]))

Topic		Replies	Views
Counts of unique values per group in a DataFrame Data question , dataframes	3	10215	May 25, 2020
Filtering dataframe for unique rows with respect one of column New to Julia question , dataframes	1	53	July 18, 2024
Delete duplicate rows in a DataFrame New to Julia dataframes	10	6117	June 22, 2023
Recoding variables and counting and removing duplicate rows in dataframes General Usage question	3	1503	March 12, 2020
Checking for unique rows in classification New to Julia dataframes	4	584	August 11, 2022

Find unique row in DataFrame

Related topics