I have a use case where an integer array contains only values between 1 and some maxValue
, and I would like the histogram of the array values. I found that counts(array, maxValue)
was too slow, and as best as I could understand, it scans the array maxValue
times to count each possible value separately, instead of scanning the array once.
I thought then to implement this by hand with
using LoopVectorization
function mycounts(array, maxValue)
count = zeros(Int64, maxValue)
@turbo for i in eachindex(array)
count[array[i]] += 1;
end
return count;
end
However, I found that this produces the wrong results. For example mycounts(ones(Int64, 1024)
gives 32
instead of 1024
, and in fact increasing the size of the array, I always get 1-32th of the correct value. So perhaps this might have to do with the loop unrolling factor of @turbo
? I can confirm that replacing it with @inbounds @simd
, I get the correct results. Is there some nuance to the safe operation of @turbo
relative to @simd
that I am missing, or is this an actual bug that I should report?
(Since I have maxValue
of order 100, the speed savings with @simd
are already significant; yet the (wrong) implementation with @turbo
gains another x5 performance boost compared to @simd
which would be nice to have.)