Sort vector by frequency

I am trying to sort a vector of strings by the frequency of tis values.

For instance:

x = ["a", "b", "b", "c", "c", "c"]

Since there are 3 "c", 2 "b" and 1 "a", I would like to get a vector of uniques in that order:

["c", "b", "a"]

Currently, I’ve managed to count the existing values:

using StatsBase

x = ["a", "b", "b", "c", "c", "c"]
string_count = StatsBase.countmap(x)
Dict("c"=>3,"b"=>2,"a"=>1)

This returns a dict with frequencies (well, the number of each element), but I am stuck at trying to transform this into a sorted vector of uniques…

I believe one could extract the keys and values as two columns of a dataframe, sort this dataframe by the number and then extract the column of values, but it seems a bit inefficient…

1 Like

You can obtain the keys and values from a Dict like so

keys1 = [k for k in keys(string_count)]
sortperm_vals = sortperm([v for v in values(string_count)])

strings_sorted_by_freq = keys1[sortperm_vals]
1 Like
collect(keys(StatsBase.countmap(x)))

but maybe countmap does not guarantee the order (?), so to sort explicitly, you can:

collect from Dict to Array of Pairs, sort by anon function returning count descending, get first. element from each Pair returned.

Like this:

string_count = StatsBase.countmap(x)

sortedvals =first.(sort(collect(string_count), by = e -> e[2], rev=true))
1 Like

Slightly cleaner:

using StatsBase

x = ["a", "b", "b", "c", "c", "c"]
string_count = StatsBase.countmap(x)
keys(sort(string_count, by = last, rev=true))
4 Likes

Or using FreqTables.jl:

julia> sort(freqtable(x))
3-element Named Array{Int64,1}
Dim1  │ 
──────┼──
a     │ 1
b     │ 2
c     │ 3
2 Likes

How do I get “a” from an array?

julia> z = sort(freqtable(x))
3-element Named Vector{Int64}
Dim1  │
──────┼──
a     │ 1
b     │ 2
c     │ 3

julia> z[1]
1

@Rafael_Brus, try this:

ft = sort(freqtable(x))
names(ft,1)[1]
1 Like