Why `count` is slower than my code?

I was expecting count to be highly optimized, but it seems like it is 10x slower than the following code, why?

using BenchmarkTools
function contar()
    c = 0
    for i in "?#?##?#?#???####??##"
        c += i == '#'
    end
    return c
end;
@btime contar() # 10x faster than `count`
@btime count('#', "?#?##?#?#???####??##")

You can take a look at how count(::Char, ::String) is implemented by doing

@edit count('#', "?#?##?#?#???####??##")

In this case, the method is written much more generically, with a single method for characters, strings and regexes as possible patterns. It also takes care of potentially overlapping byte patterns, which your method doesn’t do - it just assumes that’s not what the user wanted.

I don’t know why it was written that way, but note that the existing count is faster when you also have non-ASCII UTF-8 characters in a long string:

julia> using Random

julia> s = randstring("#?πŸ™‚", 1000000)
"#πŸ™‚##???????πŸ™‚πŸ™‚πŸ™‚πŸ™‚#??#?πŸ™‚?#?#?########πŸ™‚#?πŸ™‚?πŸ™‚#?πŸ™‚???##??πŸ™‚πŸ™‚πŸ™‚??πŸ™‚?πŸ™‚?#?πŸ™‚???#πŸ™‚#πŸ™‚πŸ™‚?πŸ™‚πŸ™‚###πŸ™‚??#πŸ™‚?πŸ™‚?##πŸ™‚πŸ™‚#?πŸ™‚?#πŸ™‚πŸ™‚?πŸ™‚πŸ™‚πŸ™‚#?#πŸ™‚πŸ™‚πŸ™‚#?#??##???#???#πŸ™‚?πŸ™‚πŸ™‚?##πŸ™‚#?#πŸ™‚##πŸ™‚?#?#πŸ™‚?πŸ™‚πŸ™‚#??πŸ™‚##??????##???#πŸ™‚#?#?#πŸ™‚πŸ™‚πŸ™‚##?##?πŸ™‚?#πŸ™‚??πŸ™‚#πŸ™‚??πŸ™‚##πŸ™‚#?πŸ™‚##πŸ™‚#?#??πŸ™‚###πŸ™‚πŸ™‚#πŸ™‚#???πŸ™‚#?πŸ™‚#??πŸ™‚######πŸ™‚?#????##????πŸ™‚?πŸ™‚##??" β‹― 2000632 bytes β‹― "πŸ™‚#πŸ™‚#πŸ™‚?#??πŸ™‚#?#πŸ™‚?#???πŸ™‚πŸ™‚#?#πŸ™‚πŸ™‚?πŸ™‚??#?##πŸ™‚?πŸ™‚πŸ™‚#πŸ™‚πŸ™‚##πŸ™‚????##πŸ™‚#?#?πŸ™‚#?πŸ™‚#?πŸ™‚#πŸ™‚πŸ™‚#?#?#πŸ™‚πŸ™‚?πŸ™‚#πŸ™‚πŸ™‚#??#πŸ™‚???πŸ™‚###?πŸ™‚πŸ™‚#πŸ™‚?πŸ™‚#?πŸ™‚πŸ™‚?πŸ™‚??πŸ™‚πŸ™‚πŸ™‚#####πŸ™‚#???πŸ™‚πŸ™‚?πŸ™‚πŸ™‚#?#?πŸ™‚πŸ™‚πŸ™‚?πŸ™‚?πŸ™‚πŸ™‚#?πŸ™‚?πŸ™‚?πŸ™‚#?###??###?πŸ™‚?πŸ™‚##??#πŸ™‚πŸ™‚πŸ™‚##πŸ™‚#πŸ™‚?##?πŸ™‚πŸ™‚πŸ™‚?πŸ™‚πŸ™‚πŸ™‚?πŸ™‚????πŸ™‚?πŸ™‚πŸ™‚#πŸ™‚#?πŸ™‚##?###πŸ™‚πŸ™‚πŸ™‚##?πŸ™‚##????πŸ™‚####?πŸ™‚?πŸ™‚?#πŸ™‚??πŸ™‚#?#πŸ™‚?#?πŸ™‚πŸ™‚?πŸ™‚###?πŸ™‚??πŸ™‚"

julia> @btime count('#', $s)
  4.593 ms (0 allocations: 0 bytes)
333686

julia> @btime contar2('#', $s)
  5.650 ms (0 allocations: 0 bytes)
333686

Perhaps this method could be specialized just for characters?

2 Likes

Also consider this example, which has a very long run of β€œnot the char I’m looking for” between '#':

julia> s = '#' * '?'^1000 * '#'; length(s)
1002

julia> @btime count('#', $s)
  44.543 ns (0 allocations: 0 bytes)
2

julia> @btime contar2('#', $s)
  1.048 ΞΌs (0 allocations: 0 bytes)
2

So this should at least suggest that there’s a tradeoff happening here. At the very least, for arbitrary strings, count should be a pretty good implementation, with room for optimization depending on the string in question (which is common for these kinds of operations).

3 Likes

I think the short answer is that support for the pattern being a Char was only added recently-ish (v1.7) and hasn’t been optimised.

Here’s the definition of count, which as already noted is a very generic function: julia/regex.jl at 09e7a05e81c92b19bd970a92e881dda759ad5fc4 Β· JuliaLang/julia Β· GitHub

Using @profile, most of the time is spent doing findnext and nextind. These functions are each called once for each match. nextind is slow because it requires decoding the underlying bytes and findnext seems to be optimised for sparse matches, which is why the example above with only a couple of matches was much faster.

If the pattern Char is representable as a single byte and the string is a String then you can super-charge it like so (the second example of the three is equivalent to the code from the OP):

julia> x = randstring("#abcdef", 10_000);

julia> @btime count('#', $x)
  44.500 ΞΌs (0 allocations: 0 bytes)
1416

julia> @btime count(==('#'), $x)
  8.567 ΞΌs (0 allocations: 0 bytes)
1416

julia> @btime count(==(UInt8('#')), codeunits($x))
  865.517 ns (0 allocations: 0 bytes)
1416

julia> y = string("#", randstring("abcdef", 10_000), "#");

julia> @btime count('#', $y)
  2.533 ΞΌs (0 allocations: 0 bytes)
2

julia> @btime count(==('#'), $y)
  8.567 ΞΌs (0 allocations: 0 bytes)
2

julia> @btime count(==(UInt8('#')), codeunits($y))
  861.667 ns (0 allocations: 0 bytes)
2
2 Likes

This kind of phenomenon pops up again and again. If you have specialized needs, you can write your own function that outperforms β€˜built-in’ functions.

I think that’s quite cool, though maybe, in this case, there could be some improvements in count.

1 Like

Seems like it would be worth a two-line PR to add optimized methods:

count(c::AbstractChar, s::AbstractString) = count(==(c), s)
count(c::AbstractChar, s::String) = isascii(c) ? count(==(UInt8(c)), codeunits(s)) : count(==(c), s)

Would be an easy PR if anyone wants to take a crack at it.

5 Likes

the other part that would help is that the findnext path needs some more propegate_inbounds calls.

It’s not quite so simple - the above benchmarks show this is slower when there are few matches.

I think the generic implementation is fine, but could be improved by (a) adding specialised implementations of findnext and (b) replacing findnext with a new function that returns the index of first character after the match, since any implementation must already know this information and it avoids the expensive nextind call.

3 Likes