I have a continuous variable that I would like to discretize based on the sample quantiles (e.g. top 1%, >1%-top10%, etc). My intuition was to use the following
using StatsBase
x = sort(100 .*rand(100)); # sort just to see equivalent values in the same place
xf = ecdf(quantile(x, [0.01, 0.1, 0.25, 0.5, 0.75, 0.9, 0.99]))
xc = categorical(round.(xf.(x), digits = 5))
I round the values to hopefully get rid of floating point inaccuracy but this seems a bit hacky. Is there a better way?