I was curious about performance. Since it’s not a tough problem, I’ve created a custom groupby function and compared it with Query.jl. Just sharing results:
Need to use BenchmarkTools for proper benchmarking.
Tried NormalizedQuantiles. It returns a Dict, which is nice, but it’s much slower than others and it does not scale with larger arrays. See updated gist for benchmark details.
Your custom function is quite fast. I think the main performance loss is because I have to expect dirty data, e.g.
A = [1,2,NaN,3,4,1,“5,0”,5.0,2,4,4]
where anything which is not of type “a number in general” is NA (not available, missing, …, like NA in R)
However, while comparing with Query.jl and your function I found a bug in my code (if the last value in the array is NA or all values are NA), which is now resolved.
Next step is to analyse your code in deep so maybe I can improve my code.
@groupby seems to be most versatile as it also groups e.g. strings: A = [“a”,“a”,“b”,“a”,“b”,“c”] or any other mix of types.