I tried to solve the problem proposed here using the function SORT() with the BY and LT keywords.
In particular I tried with the following dataframe and expression:
df = DataFrame(NAME = ["A1","A1","A2", "A1","A2","A2","A2","A2","A3","A3","A4","A5","A6","A6","A6","A6","A6","A6","A6"],
CAT = ["INF","NF", "XX" , "INH","AP","CF","UTL","GT","CP","MP","AP","BE","NF","CF","PP","AZ","PY","OP","APX"])
dft=combine(groupby(df,"NAME"), [:CAT=>(t->sort!(t,by=last,lt=(x,y)->(y=='F' || y=='P') || x<=y))=>:CAT,
:CAT=>(t->string.("CAT",1:length(t)))=>:mm])
What I’ve observed is that if I run the function COMBINE() twice, I get a different ordering of the dataframe.
This seems to depend on the particular LT function I used. Maybe I should also think more about what happens if the first parameter of the LT function is F or P.
It arose the doubt that this particular operation could also depend on the sorting algorithm, implicitly used.
I have not yet been able to find what are the possible values of the ALG parameter to see some details that may be useful to understand what is happening.
Can anyone help me understand how things are going?
Using a custom isless as following seems work correctly:
dft=combine(groupby(df,"NAME"), [:CAT=>(t->sort!(t,by= last, lt= cIsLess))=>:CAT,
:CAT=>(t->string.("CAT",1:length(t)))=>:mm])
function cIsLess(x,y)
if x in "FP"
return false
elseif y in "FP"
return true
else return x<y
end
end