Sort vector by frequency


I am trying to sort a vector of strings by the frequency of tis values.

For instance:

x = ["a", "b", "b", "c", "c", "c"]

Since there are 3 "c", 2 "b" and 1 "a", I would like to get a vector of uniques in that order:

["c", "b", "a"]

Currently, I’ve managed to count the existing values:

using StatsBase

x = ["a", "b", "b", "c", "c", "c"]
string_count = StatsBase.countmap(x)

This returns a dict with frequencies (well, the number of each element), but I am stuck at trying to transform this into a sorted vector of uniques…

I believe one could extract the keys and values as two columns of a dataframe, sort this dataframe by the number and then extract the column of values, but it seems a bit inefficient…


You can obtain the keys and values from a Dict like so

keys1 = [k for k in keys(string_count)]
sortperm_vals = sortperm([v for v in values(string_count)])

strings_sorted_by_freq = keys1[sortperm_vals]


but maybe countmap does not guarantee the order (?), so to sort explicitly, you can:

collect from Dict to Array of Pairs, sort by anon function returning count descending, get first. element from each Pair returned.

Like this:

string_count = StatsBase.countmap(x)

sortedvals =first.(sort(collect(string_count), by = e -> e[2], rev=true))


Slightly cleaner:

using StatsBase

x = ["a", "b", "b", "c", "c", "c"]
string_count = StatsBase.countmap(x)
keys(sort(string_count, by = last, rev=true))


Or using FreqTables.jl:

julia> sort(freqtable(x))
3-element Named Array{Int64,1}
Dim1  │ 
a     │ 1
b     │ 2
c     │ 3