Sorting common elements into bins


#1

Hi all,

I was just wondering if any tools already exist for sorting common elements of vectors or sets into bins? For example, for x = [1,2,2,1,3,3,3,3,1], a routine that returns that there are 3 ones, 2 twos, and 4 threes.

Cheers,

Colin


#2

Have a look at fit and Histogram in StatsBase.jl:

http://juliastats.github.io/StatsBase.jl/stable/empirical.html#Histograms-1

Example:

julia> fit(Histogram, x, closed=:left, nbins=3).weights
3-element Array{Int64,1}:
 3
 2
 4

#3
x = [1,2,2,1,3,3,3,3,1]
using StatsBase
countmap(x)

#4

even better, learned something :slight_smile:


#5

You can also use the FreqTables package, which will be more convenient if you need the result as an array. Finally in StatsBase there’s also the (poorly named) counts for small integer values.


#6

Brilliant, that was exactly what I was looking for. Thanks.

Colin


#7

Good to know thank you.

Cheers,

Colin


#8

If you have lots of these and they are all smaller than 127 then casting them to UInt8 and countmap has a fast algorithm to count them.


#9

Interesting. In my current use case I can’t guarantee < 127, but that is useful to know.

Cheers,

Colin


#10

Actually there are fast algorithms for all integers types. Especially fast for U/Int8/16.


#11

Well 16 bit integers is definitely enough. I’ll look into it.

Cheers and thanks,

Colin