Sort vector by frequency

DominiqueMakowski · September 14, 2018, 10:14am

I am trying to sort a vector of strings by the frequency of tis values.

For instance:

x = ["a", "b", "b", "c", "c", "c"]

Since there are 3 "c", 2 "b" and 1 "a", I would like to get a vector of uniques in that order:

["c", "b", "a"]

Currently, I’ve managed to count the existing values:

using StatsBase

x = ["a", "b", "b", "c", "c", "c"]
string_count = StatsBase.countmap(x)
Dict("c"=>3,"b"=>2,"a"=>1)

This returns a dict with frequencies (well, the number of each element), but I am stuck at trying to transform this into a sorted vector of uniques…

I believe one could extract the keys and values as two columns of a dataframe, sort this dataframe by the number and then extract the column of values, but it seems a bit inefficient…

xiaodai · September 14, 2018, 10:30am

You can obtain the keys and values from a Dict like so

keys1 = [k for k in keys(string_count)]
sortperm_vals = sortperm([v for v in values(string_count)])

strings_sorted_by_freq = keys1[sortperm_vals]

rvasil · September 14, 2018, 10:43am

collect(keys(StatsBase.countmap(x)))

but maybe countmap does not guarantee the order (?), so to sort explicitly, you can:

collect from Dict to Array of Pairs, sort by anon function returning count descending, get first. element from each Pair returned.

Like this:

string_count = StatsBase.countmap(x)

sortedvals =first.(sort(collect(string_count), by = e -> e[2], rev=true))

yakir12 · September 14, 2018, 10:55am

Slightly cleaner:

using StatsBase

x = ["a", "b", "b", "c", "c", "c"]
string_count = StatsBase.countmap(x)
keys(sort(string_count, by = last, rev=true))

nalimilan · September 14, 2018, 4:21pm

Or using FreqTables.jl:

julia> sort(freqtable(x))
3-element Named Array{Int64,1}
Dim1  │ 
──────┼──
a     │ 1
b     │ 2
c     │ 3

Rafael_Brus · August 28, 2021, 7:02pm

How do I get “a” from an array?

julia> z = sort(freqtable(x))
3-element Named Vector{Int64}
Dim1  │
──────┼──
a     │ 1
b     │ 2
c     │ 3

julia> z[1]
1

rafael.guerra · August 28, 2021, 10:57pm

@Rafael_Brus, try this:

ft = sort(freqtable(x))
names(ft,1)[1]

rikh · July 7, 2022, 3:35pm

Here is a solution without dependencies. Credits to @rvasil for noting the keyword arguments to sort.

function count_unique(V::AbstractVector{T}) where T
    U = unique(V)
    l = length(U)
    counts = Dict{T,Int}(zip(U, zeros(l)))
    for v in V
        counts[v] += 1
    end
    return counts
end

function frequency_sort(V::AbstractVector)
    counts = count_unique(V)
    sorted = sort(collect(counts); by=last, rev=true)
    return first.(sorted)
end

Benchmarks (Julia 1.8-rc1):

julia> using BenchmarkTools

julia> @btime frequency_sort(rand(1:100, 1_000));
  30.355 μs (28 allocations: 24.55 KiB)

julia> @btime frequency_sort(rand(1:100, 100_000));
  2.374 ms (29 allocations: 797.91 KiB)

julia>  @btime frequency_sort(rand(1:1000, 100_000));
  2.423 ms (38 allocations: 942.39 KiB)

rafael.guerra · July 7, 2022, 5:42pm

Cleaner perhaps, but note that it will not get the correct results for input:

x = ["a", "b", "b", "c", "c", "c", "d"]

Topic		Replies	Views
Sort elements by frequency in julia Data sort , sorting	4	567	September 5, 2023
How to do a reduceByKey in Julia New to Julia	6	988	February 6, 2019
Dictionary values ascending or descending New to Julia dictionary , sorting	4	1883	January 12, 2022
How to count all unique character frequency in a string? New to Julia question , statistics , strings	25	12105	January 8, 2019
Sort indices based on value New to Julia	13	424	December 8, 2023

Sort vector by frequency

Related topics