I was just wondering if there is a nicer way to do practically this:
reduce(
(d, c) -> (d[c] = get(d, c, 0) + 1; d), "ACCACD", init=Dict{Char, Int}()
)
hence, to do a non-binary count
s = "ACCACD"
Dict(
c => count(==(c), s) for c in distinct(s)
# or equally: `count(c, s)`
)
while only iterating once.
aplavin
2
This is simply a group + count operation. So:
julia> using SplitApplyCombine
julia> groupcount(s)
3-element Dictionaries.Dictionary{Char, Any}
'A' │ 2
'C' │ 3
'D' │ 1
1 Like
Oh, that seems like a simple and nice solution, I hadn’t heard of that package. Thanks!
For the example provided, countmap()
in StatsBase seems to be >200 times faster than groupcount()
in SplitApplyCombine:
using StatsBase
s = "ACCACD"
countmap(s)
@btime countmap($s) # 110 ns (4 allocations: 480 bytes)
@btime groupcount($s) # 26.7 μs (85 allocations: 11.59 KiB)
1 Like
aplavin
5
Strange, I see a much faster performance of groupcount
compared to yours:
julia> @btime groupcount($s)
366.199 ns (8 allocations: 704 bytes)
with StatsBase at 150 ns on my laptop.
No clue of what is going on. I’ve repeated it several times with similar results. For the record:
Win11 Julia 1.7
StatsBase v0.33.13
SplitApplyCombine v1.2.0