Frequency counts on a square lattice

I have a square array, c, with integer entries that I expect to repeat. I would like to get frequency counts on the different indices, as they represent categorical data. I was wondering what might be the most efficient way to do this. I know that I could flatten the array and use DataFrames.jl, but I have to do this many times, so I’m concerned about introducing unnecessary overhead through those conversions (square array to flat array to data frame).

Something like this?


julia> using StatsBase

julia> M = rand(1:10, 10, 10)
10×10 Array{Int64,2}:
 8  8  6   7   5  9   5  10  5   7
 5  1  7   3  10  9   8   4  8   2
 2  2  3   9   2  7   9   4  8   7
 4  6  8   3   6  2  10   5  3   6
 8  7  7   6   3  8   1   4  6   6
 6  3  5   5   9  6   7   1  7   5
 1  4  7   9   5  8   4   2  5   1
 6  8  6   7   3  5   1   2  8  10
 6  2  9   7   3  6   7   2  6   2
 5  9  9  10   4  2   6   7  9   1

julia> StatsBase.countmap(vec(M))
Dict{Int64,Int64} with 10 entries:
  7  => 14
  4  => 7
  9  => 10
  10 => 5
  2  => 11
  3  => 8
  5  => 12
  8  => 11
  6  => 15
  1  => 7
5 Likes

Works for me.

Note that if you know in advance that you have limited set of entries, e.g. values in 1:10, then you can do much better than countmap just by allocating an array of counts and incrementing it as you iterate through your data. For your example above, I get a speedup by more than a factor of 5:

julia> function countmap10(M)
           counts = zeros(Int, 10)
           for x in M
               counts[x] += 1
           end
           return counts
       end

julia> @btime StatsBase.countmap(vec($M))
  628.174 ns (8 allocations: 1.70 KiB)
Dict{Int64,Int64} with 10 entries:
  7  => 14
  4  => 7
  9  => 10
  10 => 5
  2  => 11
  3  => 8
  5  => 12
  8  => 11
  6  => 15
  1  => 7

julia> @btime countmap10($M)
  115.560 ns (1 allocation: 160 bytes)
10-element Array{Int64,1}:
  7
 11
  8
  7
 12
 15
 14
 11
 10
  5
3 Likes

should countmap be able to take an optional AbstractArray as possible set?

1 Like

It seems like there should be a countmap!(counts, array) function that takes any counts object supporting getindex/setindex! (e.g. a Dict or an array or some other data structure).

1 Like

If you have too many counts to stick into memory I really recommend OnlineStats.jl’s countmap :). https://github.com/joshday/OnlineStats.jl

https://joshday.github.io/OnlineStats.jl/latest/api/#OnlineStatsBase.CountMap

1 Like