How to create a regex from a string with metacharacters?

Dictino · January 28, 2020, 6:10pm

Hi

I need to look for a set of sub-strings in a long text and there are two obvious options at hand: regex and findfirst.

findfirst is nice and does exactly the thing I want, but it is a bit slow (10 times slower that using a regex on Linux, and only little bit more slower on Windows 10, see a previous post Regex performance on Windows 10 for more details).

I’ve tried to “increase the speed” of findfirst developing my own version with little success and I have no hope to get a performance comparable to regex because it make use of memchr wich in Linux is pretty good optimized (see Can this loop be optimized further?)

The problem with the regex is that the sub-string could be an arbitrary text and contain meta-characters, so I need to escape the regex MWE:

r=Regex("he.lo")
match(r,"hello world")
#RegexMatch("hello") bad, I don't want to match anything there

r=Regex(replace("he.lo", "."=>"\\."))
match(r,"hello world")
#nothing, that is what I want

But this workaround is not general enough… what if the string contains other metacharacters ([,?,$,^… and so on) ?

Is there a robust way to escape the Regex string?

Thanks in advance!

pfitzseb · January 28, 2020, 6:59pm

How about a regex?

julia> Regex(replace("h[el.o", r"([\\\^\$\.\|\?\*\+\(\)\[\]])" => s"\\\1"))
r"h\[el\.o"

Dictino · January 28, 2020, 7:22pm

This is… a meta-regex it blows my mind
Thank you @pfitzseb!

Topic		Replies	Views
How to use Regex() function? General Usage regex	5	652	October 24, 2020
Match a string literal via regex General Usage question , strings , regex	12	3620	May 10, 2019
Using Regex() function, how do I match the end of the string? General Usage question , regex	4	113	December 10, 2024
Regex escape chars General Usage strings , regex	3	2076	March 27, 2019
How to convert a raw string to escape $ New to Julia	3	217	March 26, 2024

How to create a regex from a string with metacharacters?

Related topics