I want to sort elements by frequency . For example let

```
x = ["a", "d", "d", "c", "c", "c", "f", "b", "b" ,"b", "b"]
```

I want to have output sorted as

```
x = ["b", "b" ,"b", "b","c", "c", "c, "d", "d", "a", "f"]
```

where we have 4 `b`

, 3 `c`

, 2`d`

, 1`a`

and 1`f`

.

You can use `StatsBase`

:

```
using StatsBase
function to_sort_x(x)
counts = countmap(x)
sorted_x = sort(x, by=x -> counts[x], rev=true)
return sorted_x
end
sorted_x = to_sort_x(x)
```

If you’re using `DataFrames`

, you can also do this

```
using DataFrames
x = ["a", "d", "d", "c", "c", "c", "f", "b", "b" ,"b", "b"]
df = DataFrame(x = x)
gdf = groupby(df, :x)
transform!(gdf, nrow => :count)
sort!(df, :count, rev=true)
sorted_x = copy(df.x)
```

2 Likes

Also using StatsBase, but with `rle`

and its inverse instead of `countmap`

:

```
function sort_by_count(x)
v, l = rle(x)
idxs = sortperm(l; rev=true)
return @views inverse_rle(v[idxs], l[idxs])
end
```

Some minimal benchmarking suggests that this performs a few times faster than the `countmap`

based method.

2 Likes

@digital_carver

Nice! I think you need to sort the incoming `x`

otherwise

the run length encoding won’t aggregate non-adjacent values.