Minimum Mode Problem: minimum number - maximum repetition

Hey guys, is there a performant way to solve this problem in julia?

Given a vector [1…n] return the minimum number that has the highest repetition

examples:

[1, 3, 3, 2, 3, 4] → should return 3 because it’s the most repeated number with the lowest value

[1, 2, 2, 3, 4, 4] → should return 2 because it’s the most repeated number with the lowest value

The best solution very likely depends on more information about the type and size of the data. But this is one way to do it which is not too bad:

julia> function findn(x)
           y = zeros(Int,maximum(x))
           for i in x
                y[i] += 1
           end
           m = maximum(y)
           return findfirst(isequal(m),y)
       end
findn (generic function with 1 method)

julia> x = [1, 3, 3, 2, 3, 4];

julia> findn(x)
3

julia> x = [1, 2, 2, 3, 4, 4];

julia> findn(x)
2


This is probably a good strategy if: 1) the elements of x are integers (that’s required); 2) The maximum value of the elements is not large.

1 Like

Somewhat dependency-heavy way:

julia> using StatsBase, DataStructures

julia> cm = sort!(OrderedDict(countmap([1, 2, 2, 3, 4, 4] )); byvalue = true)
OrderedDict{Int64, Int64} with 4 entries:
  3 => 1
  1 => 1
  4 => 2
  2 => 2

julia> last(cm.keys)
2
1 Like
function minMode(c)
    
    minVal = -1
    i = 0
    
    i = i+1
    for row in eachrow(c)
    row = collect(row)
    temp = filter(x -> x!=0, row)
    count = counter(temp)
    sortedCollection = sort(collect(count), by=x->x[2], rev=true)
    minVal = sortedCollection[1][1]
    repetitions = sortedCollection[1][2]
    for (key, value) in sortedCollection
        if(value == repetitions)
            if(key < minVal)
                minVal = key
            end
        end 
        end
    end
    
    return minVal
end

Thanks guys, this is my solution, adopted inside the CDLP algorithm using DataStructures module (for the counter). Do you see performance bottlenecks?

yes, so much allocation haha. Btw what’s wrong using my code above?

3 Likes

absolutely nothing jling! I thought nobody was going to answer my question and a couple of days ago i came up with my own solution but since I’m still new to language I knew this wasn’t the best in term of performance and now you confirmed it. I’m going to test both of your solutions :laughing:

If you can preallocate an array with a size large enough (with at least a number of elements equal to the maximum value of the vector), probably this is close to the fastest you can get:

julia> function findn(x,y=zeros(Int,maximum(x)))
           m = 0
           fill!(y,0)
           @inbounds for i in x
                y[i] += 1
                m = max(y[i],m)
           end
           return findfirst(isequal(m),y)
       end
findn (generic function with 2 methods)

julia> x = [1, 3, 3, 2, 3, 4];

julia> y = zeros(Int,maximum(x))

julia> @btime findn($x,$y)
  12.373 ns (0 allocations: 0 bytes)
3

If you don’t want to preallocate that array, you can use that same function with:

julia> @btime findn($x)
  36.959 ns (1 allocation: 112 bytes)
3

If maximum(x) is too large for an array be allocated with that size, than I think something like this is ok:

julia> function findn(x)
           u = sort!(unique(x))
           y = zeros(Int,length(u))
           m = 0
           @inbounds for i in x
                iy = searchsortedfirst(u,i)
                y[iy] += 1
                m = max(y[iy],m)
           end
           return findfirst(isequal(m),y)
       end
findn (generic function with 2 methods)

julia> @btime findn($x)
  263.385 ns (7 allocations: 720 bytes)
3

Edit: actually this last option is not fast for larger arrays compared to @jling suggestion (likely because searchsortedfirst becomes slower than fetching dictionary elements for large arrays), but if you can afford allocating an array with the size maximum(x) the other option is much faster:

julia> using StatsBase, DataStructures

julia> f(x) = last(sort!(OrderedDict(countmap(x)); byvalue = true).keys) #jling
f (generic function with 1 method)

julia> function findn(x,y=Vector{Int}(undef,maximum(x)))
           m = 0
           fill!(y,0)
           @inbounds for i in x
                y[i] += 1
                m = max(y[i],m)
           end
           return findfirst(isequal(m),y)
       end
findn (generic function with 2 methods)

julia> x = rand(1:100,1000);

julia> using BenchmarkTools

julia> @btime f($x)
  8.240 μs (36 allocations: 25.59 KiB)
23

julia> @btime findn($x)
  908.548 ns (1 allocation: 896 bytes)
23

julia> y = zeros(Int,100);

julia> @btime findn($x,$y)
  581.709 ns (0 allocations: 0 bytes)
23


2 Likes

If we have an iterator generating (repeat length, value)-tuples in A, we can easily obtain a maxrep_minval(A) function using the lexicographic ordering of tuples.

Definition of the iterator RepVal(A) generating (repeat length, value)-tuples in A:

function repeatlength(A, i)
    k = 1
    @inbounds while i + k ≤ lastindex(A) && isequal(A[i+k], A[i])
        k += 1
    end
    k
end

"""
    RepVal(A)
Iterator generating the all (repeat length, value)-tuples in `A`.
"""
struct RepVal{T} A::T end
Base.IteratorSize(::Type{RepVal{T}}) where T = Base.SizeUnknown()
Base.eltype(x::RepVal) = Tuple{Int, eltype(x.A)} # (rep.len., value)

Base.iterate(x::RepVal) = iterate(x, firstindex(x.A))
function Base.iterate(x::RepVal, i::Int)
    i > lastindex(x.A) && return nothing
    k = repeatlength(x.A, i)
    (k, x.A[i]), i + k
end

Input:

A = [1, 2, 3, 3, 3, 1, 1, 1, NaN, NaN, NaN, 2, 2, 3]
@show A
RepVal(A) |> collect

Output:

A = [1.0, 2.0, 3.0, 3.0, 3.0, 1.0, 1.0, 1.0, NaN, NaN, NaN, 2.0, 2.0, 3.0]

7-element Vector{Tuple{Int64, Float64}}:
 (1, 1.0)
 (1, 2.0)
 (3, 3.0)
 (3, 1.0)
 (3, NaN)
 (2, 2.0)
 (1, 3.0)

Input:

maxrep_maxval(A) = maximum(RepVal(A))

maxrep_maxval(A)

Output:

(3, NaN)

Input:

negrep((k, v)) = (-k, v)
maxrep_minval(A) = negrep(minimum(negrep, RepVal(A)))

maxrep_minval(A)

Output:

(3, 1.0)

The iterator RepVal(A) itself is very useful. We would do well to try to write types and functions that can be used as generic components.

For details of how to create an iterator, see Interfaces · The Julia Language.

Postscript 1: My interpretation of the problem is different from others. But this is also supposed to be helpful, so I’ll leave it at that. :sweat_smile:

Postscript 2: My minmode(X) function:

using StatsBase
swap_negrep((val, rep)) = (-rep, val)
inv_swap_negrep((rep, val)) = (val, -rep)
minmode(X) = inv_swap_negrep(minimum(swap_negrep, countmap(X)))

This uses the lexicographic ordering of tuples, the same trick as above.

2 Likes