Match a string literal via regex

Just like '"([^\\"]+|\\.)*?"' in Python or "\G\"([^\\\"]+|\\\.)*?\"" in C#/F#, it could build a regular expression to parse a whole text like

"xxxxxxxxxx\"xxxxxxxxxxxxxx\""

However things seem not work in Julia.

regex_s = raw"\"([^\\\"]+|\\\.)*?\""
regex = Regex(raw"\G" * regex_s)
str = repr("xxxxxxxxxx\"xxxxxxxxxxxxxx\"")
match(r, z)
# RegexMatch("\"xxxxxxxxxx\\\"", 1="xxxxxxxxxx\\")

Any workaround?

Not sure if this is what you want?

julia> str = repr("xxxxxxxxxx\"xxxxxxxxxxxxxx\"")
"\"xxxxxxxxxx\\\"xxxxxxxxxxxxxx\\\"\""

julia> collect(m.match for m in eachmatch(r"[^\"]+", str))
2-element Array{SubString{String},1}:
 "xxxxxxxxxx\\"    
 "xxxxxxxxxxxxxx\\"

I want to match a String/SubString to check if it starts with with a string("...\"..."), and if so I extract it out of the head of String/SubString(then I’ll get a new SubString).

m = match(str_regex, str)
if m === nothing
   @fail # fail and jump out of here
else
   token = m.match
   push!(tokens, token)
   str = SubString(str, length(token))
end

Also, a concrete use case is a parser generator with automatic lexers

I don’t really understand your question in the OP (I don’t know well python/c# regexes, and you have some typos), but if I understand the title correctly, you can quote a string by enclosing it with \Q and \E, e.g. Regex(raw"\Q" * str * raw"\E").

Sorry that I’m not to match a specific string.
Wen writing string literals in our codes, we write a " firstly, then followed by a sequence of characters, and finally we’d write another " to end this process.

str = "<a sequence>"

Furthermore, when we want to represent a string contains "s, we have to escape them in this way: "xx\"xx".

Note that our source codes are not special, I mean they’re still plain text.
So, how do programming language compilers parse the literal strings?
One way is using regular expression, which is quite mature and capable of expressing/matching escapes and quotations.
My problem is Julia didn’t work in this scope.

are you looking for this?

julia> str_regex = r"(...\"...)(.*)"
r"(...\"...)(.*)"

julia> test = "...\"... abc"
"...\"... abc"

julia> m = match(str_regex, test)
RegexMatch("...\"... abc", 1="...\"...", 2=" abc")

julia> m.captures[2]
" abc"

thank you, but not this.
Given a text file, whose content looks like

"this is a str\"ing"

Then read it into Julia, so how can we match it?

If I understand you correctly, you want to match a “string literal”, not match a “literal string”. (You could edit your post’s title.)

Languages are subtly different on what escaping rules apply; see e.g. https://github.com/JuliaLang/julia/issues/22926 for a nice discussion about a (fixed) corner case in Julia. Do you have any specific language you want to emulate?

This one might be helpful to what you’re trying to achieve: String literals and regular expressions . After all, your question is about regular expressions much more than it is about Julia.

your question is about regular expressions much more than it is about Julia.

Somewhat don’t agree. I know how to write this regex but don’t know why I fail this when with Julia.

You are right. I think what’s tripping you up is more likely than not what I linked above: behaviour for quotes following slashes inside raw strings. Specifically, you have \\" inside your character class, and that’s interpreted differently in Julia than elsewhere.

Here’s the easiest way I found to get it to work. I tried to make it extra readable by using the x modifier, which allows whitespace and comments as follows:

julia> str = repr("xxxxxxxxxx\"xxxxxxxxxxxxxx\"");

julia> regex = r"""
       \G     # match start
       \"     # opening quote
       (?:    # don't capture (better performance)
           [^\"\\]+  # not a quote or a slash
           |         # or
           \\.       # an escaped character
       )*?    # ungreedy multiples of the above
       \"     # closing quote
       """x;

julia> match(regex, str)
RegexMatch("\"xxxxxxxxxx\\\"xxxxxxxxxxxxxx\\\"\"", 1="\\\"")

Hope that helps!

Awesome! Also thanks for solving this in such an elegant way!