I have the following dataframe, where :a and :b are some conditions and :c is an output:
df = DataFrame(a=[1,1,2,2,1], b=[1,2,1,2,1],c=randn(5))
As you can see, the last row has the same conditions as first row (a=1, b=1). How can I filter this dataframe in order to remove duplicates based on columns :a and :b, but to keep the last occurence? The unique
function keeps the first occurence. I tried to reverse!
the dataframe and then apply unique
, but this method is not implemented.
you can do sort(df, rev = true)
instead of reverse
. see ? sort
. You can also do it in-place with sort!
1 Like
or combine(groupby(df, [:a, :b]), last)
(then instead of last
you can choose any row selector so this is a more general approach)
3 Likes
rather than using sort
I think it is better to write df[reverse(axes(df, 1)), :]
in this approach (df
in general might not be sortable)
2 Likes
Thanks to both of you for the quick replies! This helps a lot! I had to do only one modification:
combine(groupby(df, [:a, :b]), last)
2 Likes
Gents, sorry for my ignorance, but the solution issues error in Julia 1.6.0 Win10, DataFrames v0.21.8:
julia> combine(groupby(df, [:a, :b]), last)
ERROR: MethodError: no method matching combine(::GroupedDataFrame{DataFrame}, ::typeof(last))
Closest candidates are:
combine(::AbstractDataFrame, ::Any...) at C:\Users\jrafa\.julia\packages\DataFrames\GtZ1l\src\abstractdataframe\selection.jl:537
combine(::GroupedDataFrame; f...) at C:\Users\jrafa\.julia\packages\DataFrames\GtZ1l\src\groupeddataframe\splitapplycombine.jl:557
combine(::Any, ::AbstractDataFrame) at C:\Users\jrafa\.julia\packages\DataFrames\GtZ1l\src\abstractdataframe\selection.jl:540
...
Stacktrace:
[1] top-level scope
@ REPL[5]:1
you need to update you DataFrames.jl version.
1 Like