Optimizing counting number of occurrences of a given number in an array

vkv · November 18, 2023, 8:11pm

Is there a way to optimize the counter function? I’ve tried using LoopVectorization it complained saying ERROR: LoadError: Don't know how to handle expression. if i == element_of_interest .

arr = rand(1:5, 100)

given = 4

@btime countmap($arr)
# 684.667 ns (6 allocations: 1.50 KiB)   

function counter(x, element_of_interest)
    count = 0
    for i in x
        if i == element_of_interest
            count+=1
        end
    end
    return count
end

@btime counter($arr, $given)
# 17.635 ns (0 allocations: 0 bytes)

rocco_sprmnt21 · November 18, 2023, 8:15pm

Try without if, using count += i==eoi.

vkv · November 18, 2023, 8:23pm


function counter_turbo(x, element_of_interest)
       count = 0
       @turbo for i in eachindex(x)
           count+= x[i]==element_of_interest
       end
       return count
end

@btime counter_turbo($arr, $4)
# 17.635 ns (0 allocations: 0 bytes)

Timings of both counter and counter_turbo are eerily same. Upon checking @code_native of counter it seems the compiler has vectorized it.

Dan · November 18, 2023, 8:23pm

Also,

count(==(given), arr)

is around the same speed and is a standard library function.

vkv · November 18, 2023, 8:25pm

TIL.

Everyday I learn something new. Thank you.

rocco_sprmnt21 · November 19, 2023, 1:22pm

Speaking of things learned, I want to remind myself here (even to myself) of an algorithm discovered (by Dan?) in the DataFrames code, during a discussion (not long ago) about the efficiency of the groupby() function.

Applying it to the countmap case always gives excellent results


function countmap2(v)
    m,M =extrema(v)
    res=fill(0,M-m+1)
    for i in eachindex(v)
        res[v[i]-m+1]+=1
    end
    Pair.(m:M,res)
end

julia> arr
15-element Vector{Int64}:
  5
  0
  3
  1
 -1
  4
 -1
 -3
 -3
 -3
 -2
 -3
  2
  1
  2

julia> @btime countmap2($arr)
  62.653 ns (2 allocations: 336 bytes)
9-element Vector{Pair{Int64, Int64}}:
 -3 => 4
 -2 => 1
 -1 => 2
  0 => 1
  1 => 2
  2 => 2
  3 => 1
  4 => 1
  5 => 1

julia> 

julia> @btime countmap($arr)
  222.154 ns (5 allocations: 720 bytes)
Dict{Int64, Int64} with 9 entries:
  0  => 1
  4  => 1
  5  => 1
  -1 => 2
  2  => 2
  -3 => 4
  -2 => 1
  3  => 1
  1  => 2

DNF · November 19, 2023, 1:34pm

How well does it work on arr = [0, 1000, 10^14]?

rocco_sprmnt21 · November 19, 2023, 5:49pm

How, if any, can one find an optimal value for ts?

function countmap3(v, ts)
    m,M =extrema(v)
    if (M-m)/length(v)>ts
        countmap(v)
    else
        res=fill(0,M-m+1)
        for i in eachindex(v)
            res[v[i]-m+1]+=1
        end
    return Pair.(m:M,res)
    end
end

Perhaps the strategy used to establish when it is better to use a sparse array instead of a dense array could be applicable in this case too?

Dan · November 19, 2023, 6:35pm

Calculating the extrema requires scanning through v, and isn’t so cheap to begin with. There are tradeoffs. countmap should have good optimization effort invested in it, as it is ubiquitous.

DNF · November 19, 2023, 6:40pm

It’s a good method if you know an upper bound on the range of values, for example if the values are of type Int8, or represent some limited physical or logical quantity.

rocco_sprmnt21 · November 19, 2023, 6:47pm

But, wanting to use the Dataframes “trick”, I don’t see any alternatives.

Topic		Replies	Views
Counting number of occurences in an array Tooling question , statistics , arrays , splitapplycombine	10	16140	December 18, 2019
Frequency counts on a square lattice Performance	6	537	September 13, 2020
Best way to count element pairs (x,y) satisfying a condition? Performance question	19	570	October 22, 2023
How to replace multiple values in an large array with the occurrence of the value? General Usage question , array	10	1868	November 27, 2020
Count occurances for matrix rows (where column order does not matter) General Usage question , count	30	863	December 13, 2022

Optimizing counting number of occurrences of a given number in an array

Related topics