Unique! and count

Hi,
I’m trying to determine unique counts of values in an array
as an example, given the following array
data = [‘a’, ‘b’, ‘a’, ‘c’]
i wanna get: unique_array = [‘a’, ‘b’, ‘c’] and count_array = [2,1,1]
in python I can do like this: unique_array, count_array = np.unique(data, return_counts=True)
and i can also solve with julia like this: unique_array = unique!(data)
but when count, I use: count(i=>i==‘a’,data). I wonder if there are some other solutions in case I don’t know the value of data (a,b,c)

Something like count_array = [count(==(x), data) for x in unique_array] should work, though this loops over the data many times so if you have a lot of data to crunch it might be worth to look at something smarter.

3 Likes

Sounds like you want countmap in the StatsBase.jl package.

6 Likes

Or

function uniquecount(data)
   unique_array = unique(data)
   counts = Dict(unique_array .=> 0)
   for (i, c) in enumerate(data)
      counts[c] += 1     
   end
   keys(counts), values(counts)
end
1 Like

For data input as: data = rand('a':'z', 1000), StatsBase’s countmap() (including collecting keys and values) seems to be 25% faster than the count() comprehension, and ~3x faster than uniquecount().

2 Likes

Thanks, I got it