Is there a way to optimize the counter function? I’ve tried using LoopVectorization it complained saying ERROR: LoadError: Don't know how to handle expression. if i == element_of_interest .
arr = rand(1:5, 100)
given = 4
@btime countmap($arr)
# 684.667 ns (6 allocations: 1.50 KiB)
function counter(x, element_of_interest)
count = 0
for i in x
if i == element_of_interest
count+=1
end
end
return count
end
@btime counter($arr, $given)
# 17.635 ns (0 allocations: 0 bytes)
function counter_turbo(x, element_of_interest)
count = 0
@turbo for i in eachindex(x)
count+= x[i]==element_of_interest
end
return count
end
@btime counter_turbo($arr, $4)
# 17.635 ns (0 allocations: 0 bytes)
Timings of both counter and counter_turbo are eerily same. Upon checking @code_native of counter it seems the compiler has vectorized it.
Speaking of things learned, I want to remind myself here (even to myself) of an algorithm discovered (by Dan?) in the DataFrames code, during a discussion (not long ago) about the efficiency of the groupby() function.
Applying it to the countmap case always gives excellent results
How, if any, can one find an optimal value for ts?
function countmap3(v, ts)
m,M =extrema(v)
if (M-m)/length(v)>ts
countmap(v)
else
res=fill(0,M-m+1)
for i in eachindex(v)
res[v[i]-m+1]+=1
end
return Pair.(m:M,res)
end
end
Perhaps the strategy used to establish when it is better to use a sparse array instead of a dense array could be applicable in this case too?
Calculating the extrema requires scanning through v, and isn’t so cheap to begin with. There are tradeoffs. countmap should have good optimization effort invested in it, as it is ubiquitous.
It’s a good method if you know an upper bound on the range of values, for example if the values are of type Int8, or represent some limited physical or logical quantity.