Count various patterns simultaneously

ludwig-austermann · December 14, 2021, 10:33am

I was just wondering if there is a nicer way to do practically this:

reduce(
    (d, c) -> (d[c] = get(d, c, 0) + 1; d), "ACCACD", init=Dict{Char, Int}()
)

hence, to do a non-binary count

s = "ACCACD"
Dict(
    c => count(==(c), s) for c in distinct(s)
    # or equally: `count(c, s)`
)

while only iterating once.

aplavin · December 14, 2021, 10:46am

This is simply a group + count operation. So:

julia> using SplitApplyCombine

julia> groupcount(s)
3-element Dictionaries.Dictionary{Char, Any}
 'A' │ 2
 'C' │ 3
 'D' │ 1

ludwig-austermann · December 14, 2021, 11:00am

Oh, that seems like a simple and nice solution, I hadn’t heard of that package. Thanks!

rafael.guerra · December 14, 2021, 11:42am

For the example provided, countmap() in StatsBase seems to be >200 times faster than groupcount() in SplitApplyCombine:

using StatsBase
s = "ACCACD"
countmap(s)

@btime countmap($s)    #  110 ns (4 allocations: 480 bytes)
@btime groupcount($s)  # 26.7 μs (85 allocations: 11.59 KiB)

aplavin · December 14, 2021, 1:25pm

Strange, I see a much faster performance of groupcount compared to yours:

julia> @btime groupcount($s)
  366.199 ns (8 allocations: 704 bytes)

with StatsBase at 150 ns on my laptop.

rafael.guerra · December 14, 2021, 1:34pm

No clue of what is going on. I’ve repeated it several times with similar results. For the record:

Win11 Julia 1.7
StatsBase v0.33.13
SplitApplyCombine v1.2.0

Topic		Replies	Views
How to count all unique character frequency in a string? New to Julia question , statistics , strings	25	12105	January 8, 2019
SplitApplyCombine.jl `group` enhancements, reaches version 1.0.0 Data announcement , splitapplycombine	1	875	December 25, 2019
Multi of word in dict New to Julia dictionary , splitapplycombine	6	511	August 7, 2020
Various by-group strategies compared Data	36	3932	January 30, 2018
StatsBase.countmap for iterator not working? `countmap(i for i in 1:10)` General Usage package	2	521	January 13, 2021

Count various patterns simultaneously

Related topics