Count various patterns simultaneously

I was just wondering if there is a nicer way to do practically this:

reduce(
    (d, c) -> (d[c] = get(d, c, 0) + 1; d), "ACCACD", init=Dict{Char, Int}()
)

hence, to do a non-binary count

s = "ACCACD"
Dict(
    c => count(==(c), s) for c in distinct(s)
    # or equally: `count(c, s)`
)

while only iterating once.

This is simply a group + count operation. So:

julia> using SplitApplyCombine

julia> groupcount(s)
3-element Dictionaries.Dictionary{Char, Any}
 'A' │ 2
 'C' │ 3
 'D' │ 1
1 Like

Oh, that seems like a simple and nice solution, I hadn’t heard of that package. Thanks!

For the example provided, countmap() in StatsBase seems to be >200 times faster than groupcount() in SplitApplyCombine:

using StatsBase
s = "ACCACD"
countmap(s)

@btime countmap($s)    #  110 ns (4 allocations: 480 bytes)
@btime groupcount($s)  # 26.7 μs (85 allocations: 11.59 KiB)
1 Like

Strange, I see a much faster performance of groupcount compared to yours:

julia> @btime groupcount($s)
  366.199 ns (8 allocations: 704 bytes)

with StatsBase at 150 ns on my laptop.

No clue of what is going on. I’ve repeated it several times with similar results. For the record:

Win11 Julia 1.7
StatsBase v0.33.13
SplitApplyCombine v1.2.0