How to create a regex from a string with metacharacters?

Hi

I need to look for a set of sub-strings in a long text and there are two obvious options at hand: regex and findfirst.

findfirst is nice and does exactly the thing I want, but it is a bit slow (10 times slower that using a regex on Linux, and only little bit more slower on Windows 10, see a previous post Regex performance on Windows 10 for more details).

I’ve tried to “increase the speed” of findfirst developing my own version with little success and I have no hope to get a performance comparable to regex because it make use of memchr wich in Linux is pretty good optimized (see Can this loop be optimized further?)

The problem with the regex is that the sub-string could be an arbitrary text and contain meta-characters, so I need to escape the regex MWE:

r=Regex("he.lo")
match(r,"hello world")
#RegexMatch("hello") bad, I don't want to match anything there

r=Regex(replace("he.lo", "."=>"\\."))
match(r,"hello world")
#nothing, that is what I want

But this workaround is not general enough… what if the string contains other metacharacters ([,?,$,^… and so on) ?

Is there a robust way to escape the Regex string?

Thanks in advance!

How about a regex? :slight_smile:

julia> Regex(replace("h[el.o", r"([\\\^\$\.\|\?\*\+\(\)\[\]])" => s"\\\1"))
r"h\[el\.o"
2 Likes

This is… a meta-regex :wink: it blows my mind :exploding_head:
Thank you @pfitzseb!

1 Like