# Sort vector by frequency

I am trying to sort a vector of strings by the frequency of tis values.

For instance:

``````x = ["a", "b", "b", "c", "c", "c"]
``````

Since there are 3 `"c"`, 2 `"b"` and 1 `"a"`, I would like to get a vector of uniques in that order:

``````["c", "b", "a"]
``````

Currently, I’ve managed to count the existing values:

``````using StatsBase

x = ["a", "b", "b", "c", "c", "c"]
string_count = StatsBase.countmap(x)
Dict("c"=>3,"b"=>2,"a"=>1)
``````

This returns a dict with frequencies (well, the number of each element), but I am stuck at trying to transform this into a sorted vector of uniques…

I believe one could extract the keys and values as two columns of a dataframe, sort this dataframe by the `number` and then extract the column of `values`, but it seems a bit inefficient…

2 Likes

You can obtain the `keys` and `values` from a `Dict` like so

``````keys1 = [k for k in keys(string_count)]
sortperm_vals = sortperm([v for v in values(string_count)])

strings_sorted_by_freq = keys1[sortperm_vals]
``````
1 Like
``````collect(keys(StatsBase.countmap(x)))
``````

but maybe `countmap` does not guarantee the order (?), so to sort explicitly, you can:

`collect` from Dict to Array of Pairs, `sort` by anon function returning count descending, get `first.` element from each Pair returned.

Like this:

``````string_count = StatsBase.countmap(x)

sortedvals =first.(sort(collect(string_count), by = e -> e, rev=true))
``````
2 Likes

Slightly cleaner:

``````using StatsBase

x = ["a", "b", "b", "c", "c", "c"]
string_count = StatsBase.countmap(x)
keys(sort(string_count, by = last, rev=true))
``````
6 Likes

Or using FreqTables.jl:

``````julia> sort(freqtable(x))
3-element Named Array{Int64,1}
Dim1  │
──────┼──
a     │ 1
b     │ 2
c     │ 3
``````
3 Likes

How do I get “a” from an array?

``````julia> z = sort(freqtable(x))
3-element Named Vector{Int64}
Dim1  │
──────┼──
a     │ 1
b     │ 2
c     │ 3

julia> z
1
``````

@Rafael_Brus, try this:

``````ft = sort(freqtable(x))
names(ft,1)
``````
1 Like

Here is a solution without dependencies. Credits to @rvasil for noting the keyword arguments to `sort`.

``````function count_unique(V::AbstractVector{T}) where T
U = unique(V)
l = length(U)
counts = Dict{T,Int}(zip(U, zeros(l)))
for v in V
counts[v] += 1
end
return counts
end

function frequency_sort(V::AbstractVector)
counts = count_unique(V)
sorted = sort(collect(counts); by=last, rev=true)
return first.(sorted)
end
``````

Benchmarks (Julia 1.8-rc1):

``````julia> using BenchmarkTools

julia> @btime frequency_sort(rand(1:100, 1_000));
30.355 μs (28 allocations: 24.55 KiB)

julia> @btime frequency_sort(rand(1:100, 100_000));
2.374 ms (29 allocations: 797.91 KiB)

julia>  @btime frequency_sort(rand(1:1000, 100_000));
2.423 ms (38 allocations: 942.39 KiB)
``````
1 Like

Cleaner perhaps, but note that it will not get the correct results for input:

``````x = ["a", "b", "b", "c", "c", "c", "d"]
``````