I am trying to sort a vector of strings by the frequency of tis values.
For instance:
x = ["a", "b", "b", "c", "c", "c"]
Since there are 3 "c", 2 "b" and 1 "a", I would like to get a vector of uniques in that order:
["c", "b", "a"]
Currently, I’ve managed to count the existing values:
using StatsBase
x = ["a", "b", "b", "c", "c", "c"]
string_count = StatsBase.countmap(x)
Dict("c"=>3,"b"=>2,"a"=>1)
This returns a dict with frequencies (well, the number of each element), but I am stuck at trying to transform this into a sorted vector of uniques…
I believe one could extract the keys and values as two columns of a dataframe, sort this dataframe by the number and then extract the column of values, but it seems a bit inefficient…
Here is a solution without dependencies. Credits to @rvasil for noting the keyword arguments to sort.
function count_unique(V::AbstractVector{T}) where T
U = unique(V)
l = length(U)
counts = Dict{T,Int}(zip(U, zeros(l)))
for v in V
counts[v] += 1
end
return counts
end
function frequency_sort(V::AbstractVector)
counts = count_unique(V)
sorted = sort(collect(counts); by=last, rev=true)
return first.(sorted)
end