The way I read his question the answer from jling does not seem right. But maybe I misunderstood.
maybe df3 in my code below is what the original poster wants
using Random
using DataFrames
Random.seed!(2)
df=DataFrame(rand(16,3))
rename!(df,[:A,:B,:Z])
df.Z .= round.(df.Z)
df2=combine(groupby(df,:Z),:A=>maximum=>:A)
df3=leftjoin(df2,df,on=[:A,:Z])
disallowmissing!(df3)
df4=combine(groupby(df,:Z),:A=>maximum=>:A,:B=>first=>:B)
isequal(df4,df3)
I think I read the question the same way as you - however I was surprised that pandas would include the values of the other rows when using the aggregation function OP gave, and it looks like it doesn’t:
>>> df = pd.DataFrame({"a": [8, 2, 3, 1, 9, 3], "b": [11, 12, 13, 14, 15, 16], "c": ['a', 'a', 'a', 'b', 'b', 'b']})
>>> df
a b c
0 8 11 a
1 2 12 a
2 3 13 a
3 1 14 b
4 9 15 b
5 3 16 b
>>> df.groupby(["c"])["a","b"].agg({"a":"max"})
<stdin>:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
a
a b
c
a 8 13
b 9 16
so here it looks like you just get last for column b, irrespective of where maximum(a) is actually located - in which case this could be replicated by using :B => last => :B in DataFrames.jl
Isn’t there another way without using a join?
For example getting the index of the (A) maximum (instead of the maximum itself) and then use that index to retrieve the (A) maximum and also B.
@bkamins also gave a great solution on Slack which I’ll add here so it doesn’t get swallowed by the Slack memory hole (as I’m sure I’ll be looking for this at some point in the near future):