Got a reasonably fast solution here using the SortingLab.jl, see
I am not sure how fast it is compared to Matlab. Perhaps @complexfilter can do a test and share the results.