Why `count` is slower than my code?

aerdely · June 16, 2022, 5:29am

I was expecting count to be highly optimized, but it seems like it is 10x slower than the following code, why?

using BenchmarkTools
function contar()
    c = 0
    for i in "?#?##?#?#???####??##"
        c += i == '#'
    end
    return c
end;
@btime contar() # 10x faster than `count`
@btime count('#', "?#?##?#?#???####??##")

Sukera · June 16, 2022, 5:42am

You can take a look at how count(::Char, ::String) is implemented by doing

@edit count('#', "?#?##?#?#???####??##")

In this case, the method is written much more generically, with a single method for characters, strings and regexes as possible patterns. It also takes care of potentially overlapping byte patterns, which your method doesn’t do - it just assumes that’s not what the user wanted.

I don’t know why it was written that way, but note that the existing count is faster when you also have non-ASCII UTF-8 characters in a long string:

julia> using Random

julia> s = randstring("#?🙂", 1000000)
"#🙂##???????🙂🙂🙂🙂#??#?🙂?#?#?########🙂#?🙂?🙂#?🙂???##??🙂🙂🙂??🙂?🙂?#?🙂???#🙂#🙂🙂?🙂🙂###🙂??#🙂?🙂?##🙂🙂#?🙂?#🙂🙂?🙂🙂🙂#?#🙂🙂🙂#?#??##???#???#🙂?🙂🙂?##🙂#?#🙂##🙂?#?#🙂?🙂🙂#??🙂##??????##???#🙂#?#?#🙂🙂🙂##?##?🙂?#🙂??🙂#🙂??🙂##🙂#?🙂##🙂#?#??🙂###🙂🙂#🙂#???🙂#?🙂#??🙂######🙂?#????##????🙂?🙂##??" ⋯ 2000632 bytes ⋯ "🙂#🙂#🙂?#??🙂#?#🙂?#???🙂🙂#?#🙂🙂?🙂??#?##🙂?🙂🙂#🙂🙂##🙂????##🙂#?#?🙂#?🙂#?🙂#🙂🙂#?#?#🙂🙂?🙂#🙂🙂#??#🙂???🙂###?🙂🙂#🙂?🙂#?🙂🙂?🙂??🙂🙂🙂#####🙂#???🙂🙂?🙂🙂#?#?🙂🙂🙂?🙂?🙂🙂#?🙂?🙂?🙂#?###??###?🙂?🙂##??#🙂🙂🙂##🙂#🙂?##?🙂🙂🙂?🙂🙂🙂?🙂????🙂?🙂🙂#🙂#?🙂##?###🙂🙂🙂##?🙂##????🙂####?🙂?🙂?#🙂??🙂#?#🙂?#?🙂🙂?🙂###?🙂??🙂"

julia> @btime count('#', $s)
  4.593 ms (0 allocations: 0 bytes)
333686

julia> @btime contar2('#', $s)
  5.650 ms (0 allocations: 0 bytes)
333686

Perhaps this method could be specialized just for characters?

Sukera · June 16, 2022, 6:07am

Also consider this example, which has a very long run of “not the char I’m looking for” between '#':

julia> s = '#' * '?'^1000 * '#'; length(s)
1002

julia> @btime count('#', $s)
  44.543 ns (0 allocations: 0 bytes)
2

julia> @btime contar2('#', $s)
  1.048 μs (0 allocations: 0 bytes)
2

So this should at least suggest that there’s a tradeoff happening here. At the very least, for arbitrary strings, count should be a pretty good implementation, with room for optimization depending on the string in question (which is common for these kinds of operations).

cjdoris · June 16, 2022, 8:51am

I think the short answer is that support for the pattern being a Char was only added recently-ish (v1.7) and hasn’t been optimised.

Here’s the definition of count, which as already noted is a very generic function: julia/regex.jl at 09e7a05e81c92b19bd970a92e881dda759ad5fc4 · JuliaLang/julia · GitHub

Using @profile, most of the time is spent doing findnext and nextind. These functions are each called once for each match. nextind is slow because it requires decoding the underlying bytes and findnext seems to be optimised for sparse matches, which is why the example above with only a couple of matches was much faster.

If the pattern Char is representable as a single byte and the string is a String then you can super-charge it like so (the second example of the three is equivalent to the code from the OP):

julia> x = randstring("#abcdef", 10_000);

julia> @btime count('#', $x)
  44.500 μs (0 allocations: 0 bytes)
1416

julia> @btime count(==('#'), $x)
  8.567 μs (0 allocations: 0 bytes)
1416

julia> @btime count(==(UInt8('#')), codeunits($x))
  865.517 ns (0 allocations: 0 bytes)
1416

julia> y = string("#", randstring("abcdef", 10_000), "#");

julia> @btime count('#', $y)
  2.533 μs (0 allocations: 0 bytes)
2

julia> @btime count(==('#'), $y)
  8.567 μs (0 allocations: 0 bytes)
2

julia> @btime count(==(UInt8('#')), codeunits($y))
  861.667 ns (0 allocations: 0 bytes)
2

DNF · June 16, 2022, 9:09am

This kind of phenomenon pops up again and again. If you have specialized needs, you can write your own function that outperforms ‘built-in’ functions.

I think that’s quite cool, though maybe, in this case, there could be some improvements in count.

stevengj · June 16, 2022, 11:36am

Seems like it would be worth a two-line PR to add optimized methods:

count(c::AbstractChar, s::AbstractString) = count(==(c), s)
count(c::AbstractChar, s::String) = isascii(c) ? count(==(UInt8(c)), codeunits(s)) : count(==(c), s)

Would be an easy PR if anyone wants to take a crack at it.

Oscar_Smith · June 16, 2022, 12:24pm

the other part that would help is that the findnext path needs some more propegate_inbounds calls.

cjdoris · June 16, 2022, 1:13pm

It’s not quite so simple - the above benchmarks show this is slower when there are few matches.

I think the generic implementation is fine, but could be improved by (a) adding specialised implementations of findnext and (b) replacing findnext with a new function that returns the index of first character after the match, since any implementation must already know this information and it avoids the expensive nextind call.

Topic		Replies	Views
Method for counting characters in string? count("a", "batman") vs. count('a', "batman") Internals & Design question	7	2185	December 3, 2020
How to count all unique character frequency in a string? New to Julia question , statistics , strings	25	12106	January 8, 2019
Super slow string performance Performance performance	2	905	March 31, 2022
Count words challenge Performance	24	2178	March 23, 2021
Count various patterns simultaneously General Usage question , iterators , count , splitapplycombine	5	525	December 14, 2021

Why `count` is slower than my code?

Related topics