Sort elements by frequency in julia

I want to sort elements by frequency . For example let

x = ["a", "d", "d", "c", "c", "c", "f", "b", "b" ,"b", "b"]

I want to have output sorted as

x = ["b", "b" ,"b", "b","c", "c", "c, "d", "d", "a", "f"]

where we have 4 b , 3 c , 2d , 1a and 1f.

You can use StatsBase:

using StatsBase

function to_sort_x(x)
    counts   = countmap(x)
    sorted_x = sort(x, by=x -> counts[x], rev=true)
    return sorted_x
end

sorted_x = to_sort_x(x)

If you’re using DataFrames, you can also do this

using DataFrames

x = ["a", "d", "d", "c", "c", "c", "f", "b", "b" ,"b", "b"]

df = DataFrame(x = x)

gdf = groupby(df, :x)
transform!(gdf, nrow => :count)
sort!(df, :count, rev=true)

sorted_x = copy(df.x)
2 Likes

Also using StatsBase, but with rle and its inverse instead of countmap:

function sort_by_count(x)
    v, l = rle(x)
    idxs = sortperm(l; rev=true)
    return @views inverse_rle(v[idxs], l[idxs])
end

Some minimal benchmarking suggests that this performs a few times faster than the countmap based method.

2 Likes

@digital_carver

Nice! I think you need to sort the incoming x otherwise
the run length encoding won’t aggregate non-adjacent values.

Linking related thread,