Hi,
I’m trying to determine unique counts of values in an array
as an example, given the following array
data = [‘a’, ‘b’, ‘a’, ‘c’]
i wanna get: unique_array = [‘a’, ‘b’, ‘c’] and count_array = [2,1,1]
in python I can do like this: unique_array, count_array = np.unique(data, return_counts=True)
and i can also solve with julia like this: unique_array = unique!(data)
but when count, I use: count(i=>i==‘a’,data). I wonder if there are some other solutions in case I don’t know the value of data (a,b,c)
Something like count_array = [count(==(x), data) for x in unique_array]
should work, though this loops over the data many times so if you have a lot of data to crunch it might be worth to look at something smarter.
3 Likes
Sounds like you want countmap
in the StatsBase.jl package.
6 Likes
Or
function uniquecount(data)
unique_array = unique(data)
counts = Dict(unique_array .=> 0)
for (i, c) in enumerate(data)
counts[c] += 1
end
keys(counts), values(counts)
end
1 Like
For data input as: data = rand('a':'z', 1000)
, StatsBase’s countmap()
(including collecting keys and values) seems to be 25% faster than the count()
comprehension, and ~3x faster than uniquecount()
.
2 Likes
Thanks, I got it