Hi
I need to look for a set of sub-strings in a long text and there are two obvious options at hand: regex
and findfirst
.
findfirst
is nice and does exactly the thing I want, but it is a bit slow (10 times slower that using a regex on Linux, and only little bit more slower on Windows 10, see a previous post Regex performance on Windows 10 for more details).
I’ve tried to “increase the speed” of findfirst
developing my own version with little success and I have no hope to get a performance comparable to regex because it make use of memchr
wich in Linux is pretty good optimized (see Can this loop be optimized further?)
The problem with the regex
is that the sub-string could be an arbitrary text and contain meta-characters, so I need to escape the regex MWE:
r=Regex("he.lo")
match(r,"hello world")
#RegexMatch("hello") bad, I don't want to match anything there
r=Regex(replace("he.lo", "."=>"\\."))
match(r,"hello world")
#nothing, that is what I want
But this workaround is not general enough… what if the string contains other metacharacters ([,?,$,^… and so on) ?
Is there a robust way to escape the Regex string?
Thanks in advance!