DataFrames unique - keep last occurence

I have the following dataframe, where :a and :b are some conditions and :c is an output:

 df = DataFrame(a=[1,1,2,2,1], b=[1,2,1,2,1],c=randn(5))

As you can see, the last row has the same conditions as first row (a=1, b=1). How can I filter this dataframe in order to remove duplicates based on columns :a and :b, but to keep the last occurence? The unique function keeps the first occurence. I tried to reverse! the dataframe and then apply unique, but this method is not implemented.

you can do sort(df, rev = true) instead of reverse. see ? sort. You can also do it in-place with sort!

1 Like

or combine(groupby(df, [:a, :b]), last) (then instead of last you can choose any row selector so this is a more general approach)

3 Likes

rather than using sort I think it is better to write df[reverse(axes(df, 1)), :] in this approach (df in general might not be sortable)

2 Likes

Thanks to both of you for the quick replies! This helps a lot! I had to do only one modification:
combine(groupby(df, [:a, :b]), last)

2 Likes

Gents, sorry for my ignorance, but the solution issues error in Julia 1.6.0 Win10, DataFrames v0.21.8:

julia> combine(groupby(df, [:a, :b]), last)
ERROR: MethodError: no method matching combine(::GroupedDataFrame{DataFrame}, ::typeof(last))
Closest candidates are:
  combine(::AbstractDataFrame, ::Any...) at C:\Users\jrafa\.julia\packages\DataFrames\GtZ1l\src\abstractdataframe\selection.jl:537
  combine(::GroupedDataFrame; f...) at C:\Users\jrafa\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:557
  combine(::Any, ::AbstractDataFrame) at C:\Users\jrafa\.julia\packages\DataFrames\GtZ1l\src\abstractdataframe\selection.jl:540
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[5]:1

you need to update you DataFrames.jl version.

1 Like