I was expecting count to be highly optimized, but it seems like it is 10x slower than the following code, why?
using BenchmarkTools
function contar()
c = 0
for i in "?#?##?#?#???####??##"
c += i == '#'
end
return c
end;
@btime contar() # 10x faster than `count`
@btime count('#', "?#?##?#?#???####??##")
You can take a look at how count(::Char, ::String) is implemented by doing
@edit count('#', "?#?##?#?#???####??##")
In this case, the method is written much more generically, with a single method for characters, strings and regexes as possible patterns. It also takes care of potentially overlapping byte patterns, which your method doesnβt do - it just assumes thatβs not what the user wanted.
I donβt know why it was written that way, but note that the existing count is faster when you also have non-ASCII UTF-8 characters in a long string:
julia> using Random
julia> s = randstring("#?π", 1000000)
"#π##???????ππππ#??#?π?#?#?########π#?π?π#?π???##??πππ??π?π?#?π???#π#ππ?ππ###π??#π?π?##ππ#?π?#ππ?πππ#?#πππ#?#??##???#???#π?ππ?##π#?#π##π?#?#π?ππ#??π##??????##???#π#?#?#πππ##?##?π?#π??π#π??π##π#?π##π#?#??π###ππ#π#???π#?π#??π######π?#????##????π?π##??" β― 2000632 bytes β― "π#π#π?#??π#?#π?#???ππ#?#ππ?π??#?##π?ππ#ππ##π????##π#?#?π#?π#?π#ππ#?#?#ππ?π#ππ#??#π???π###?ππ#π?π#?ππ?π??πππ#####π#???ππ?ππ#?#?πππ?π?ππ#?π?π?π#?###??###?π?π##??#πππ##π#π?##?πππ?πππ?π????π?ππ#π#?π##?###πππ##?π##????π####?π?π?#π??π#?#π?#?ππ?π###?π??π"
julia> @btime count('#', $s)
4.593 ms (0 allocations: 0 bytes)
333686
julia> @btime contar2('#', $s)
5.650 ms (0 allocations: 0 bytes)
333686
Perhaps this method could be specialized just for characters?
So this should at least suggest that thereβs a tradeoff happening here. At the very least, for arbitrary strings, count should be a pretty good implementation, with room for optimization depending on the string in question (which is common for these kinds of operations).
Using @profile, most of the time is spent doing findnext and nextind. These functions are each called once for each match. nextind is slow because it requires decoding the underlying bytes and findnext seems to be optimised for sparse matches, which is why the example above with only a couple of matches was much faster.
If the pattern Char is representable as a single byte and the string is a String then you can super-charge it like so (the second example of the three is equivalent to the code from the OP):
This kind of phenomenon pops up again and again. If you have specialized needs, you can write your own function that outperforms βbuilt-inβ functions.
I think thatβs quite cool, though maybe, in this case, there could be some improvements in count.
Itβs not quite so simple - the above benchmarks show this is slower when there are few matches.
I think the generic implementation is fine, but could be improved by (a) adding specialised implementations of findnext and (b) replacing findnext with a new function that returns the index of first character after the match, since any implementation must already know this information and it avoids the expensive nextind call.