Inconsistent results when using split-apply-combine for DataFrames

I’m looking to add a column to my DataFrame combo2 that represents the ranking of DK_points by each Pos (position) group with the following code.

sort!(combo2, [:Pos, :DK_points], rev=[true, true])
dk_rank = []

for pos in groupby(combo2, :Pos)
    append!(dk_rank, 1:size(pos, 1))
insertcols!(combo2, 6, DK_rank=dk_rank)

However, one particular value for Pos, “QB”, gives unusual results with the ranking starting at 14 rather than the desired 1.

by(combo2, :Pos, y->DataFrame(minRank = minimum(y[:,:DK_rank])))
	Pos	minRank
String	Int64
1	QB	14
2	RB	1
3	WR	1
4	TE	1
5	DST	1

Very confused why this would happen.

Also, wondering if there are better ways to accomplish adding a ranking for multiple columns in a data set by each group of another particular column.

First, how you can do it more easily is:

by(combo2, :Pos, y -> (DK_rank=axes(y, 1)))

if your data frame is sorted

by(combo2, :Pos, y -> (DK_rank=sortperm(y, :DK_points, rev=true)))

if it is not sorted yet.

Now the problem you encounter is most likely related to sorting (but I am not sure). Can you please give the result of the following operation (or share your source data frame):

by(combo2, :Pos, y->DataFrame(len=nrow(y), rankrange = extrema(y[:,:DK_rank], isok=issorted(y[:,:DK_rank]))))

Additionally - I have just checked your code on some random data and I could not reproduce the problem.

	Pos	len	rankrange	isok
String	Int64	Tuple…	Bool
1	QB	39	(14, 52)	true
2	RB	53	(1, 56)	false
3	WR	56	(1, 39)	false
4	TE	52	(1, 53)	false
5	DST	33	(1, 33)	true

Appears that the append order is out of whack. But not really sure why.

can you send me the data privately if you cannot share them openly? (as I have said - I have checked your codes on random data and they are OK). Also please confirm if the two methods I suggested work correctly on your data.

If I check for “sortedness” just on DK_points, seems sorting indeed didn’t work correctly.

sort!(combo2, [:Pos, :DK_points], rev=[true, true])
by(combo2, :Pos, y->DataFrame(len=nrow(y), isok=issorted(y[:,:DK_points])))
Pos	len	isok
String	Int64	Bool
1	QB	39	false
2	RB	53	false
3	WR	56	false
4	TE	52	false
5	DST	33	false

With both of the methods you provided, if I insert the rank column back to the dataframe using

m = by(combo2, :Pos, y -> (DK_rank=sortperm(y, :DK_points, rev=true)))[:, :x1]
insertcols!(combo2, 6, DK_rank=m)

I get the same result as before

Please make sure that your combo2 data frame is sorted. In your post :Pos column does not seem to be sorted in reverse order. The order should be like:

5×2 DataFrame
│ Row │ Pos    │ minRank │
│     │ String │ Int64   │
│ 1   │ WR     │ 1       │
│ 2   │ TE     │ 1       │
│ 3   │ RB     │ 1       │
│ 4   │ QB     │ 1       │
│ 5   │ DST    │ 1       │