DataFrames unique - keep last occurence

Iulian.Cioarca · April 21, 2021, 10:31am

I have the following dataframe, where :a and :b are some conditions and :c is an output:

 df = DataFrame(a=[1,1,2,2,1], b=[1,2,1,2,1],c=randn(5))

As you can see, the last row has the same conditions as first row (a=1, b=1). How can I filter this dataframe in order to remove duplicates based on columns :a and :b, but to keep the last occurence? The unique function keeps the first occurence. I tried to reverse! the dataframe and then apply unique, but this method is not implemented.

pdeffebach · April 21, 2021, 10:45am

you can do sort(df, rev = true) instead of reverse. see ? sort. You can also do it in-place with sort!

bkamins · April 21, 2021, 10:46am

or combine(groupby(df, [:a, :b]), last) (then instead of last you can choose any row selector so this is a more general approach)

bkamins · April 21, 2021, 10:48am

rather than using sort I think it is better to write df[reverse(axes(df, 1)), :] in this approach (df in general might not be sortable)

Iulian.Cioarca · April 21, 2021, 11:09am

Thanks to both of you for the quick replies! This helps a lot! I had to do only one modification:
combine(groupby(df, [:a, :b]), last)

rafael.guerra · April 21, 2021, 2:25pm

Gents, sorry for my ignorance, but the solution issues error in Julia 1.6.0 Win10, DataFrames v0.21.8:

julia> combine(groupby(df, [:a, :b]), last)
ERROR: MethodError: no method matching combine(::GroupedDataFrame{DataFrame}, ::typeof(last))
Closest candidates are:
  combine(::AbstractDataFrame, ::Any...) at C:\Users\jrafa\.julia\packages\DataFrames\GtZ1l\src\abstractdataframe\selection.jl:537
  combine(::GroupedDataFrame; f...) at C:\Users\jrafa\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:557
  combine(::Any, ::AbstractDataFrame) at C:\Users\jrafa\.julia\packages\DataFrames\GtZ1l\src\abstractdataframe\selection.jl:540
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[5]:1

bkamins · April 21, 2021, 2:33pm

you need to update you DataFrames.jl version.

Topic		Replies	Views
Delete duplicate rows in a DataFrame New to Julia dataframes	10	6109	June 22, 2023
A nice use case for DataFrames.jl - flexible dedup General Usage dataframes , tables , splitapplycombine	4	599	July 16, 2021
Remove all entries that occur more than once New to Julia dataframes	3	425	February 18, 2022
Filtering dataframe for unique rows with respect one of column New to Julia question , dataframes	1	52	July 18, 2024
Delete missing values after the last non missing value in each id New to Julia dataframes	7	558	September 1, 2022

DataFrames unique - keep last occurence

Related topics