Just like '"([^\\"]+|\\.)*?"'
in Python or "\G\"([^\\\"]+|\\\.)*?\""
in C#/F#, it could build a regular expression to parse a whole text like
"xxxxxxxxxx\"xxxxxxxxxxxxxx\""
However things seem not work in Julia.
regex_s = raw"\"([^\\\"]+|\\\.)*?\""
regex = Regex(raw"\G" * regex_s)
str = repr("xxxxxxxxxx\"xxxxxxxxxxxxxx\"")
match(r, z)
# RegexMatch("\"xxxxxxxxxx\\\"", 1="xxxxxxxxxx\\")
Any workaround?
Not sure if this is what you want?
julia> str = repr("xxxxxxxxxx\"xxxxxxxxxxxxxx\"")
"\"xxxxxxxxxx\\\"xxxxxxxxxxxxxx\\\"\""
julia> collect(m.match for m in eachmatch(r"[^\"]+", str))
2-element Array{SubString{String},1}:
"xxxxxxxxxx\\"
"xxxxxxxxxxxxxx\\"
I want to match a String/SubString to check if it starts with with a string("...\"..."
), and if so I extract it out of the head of String/SubString(then I’ll get a new SubString).
m = match(str_regex, str)
if m === nothing
@fail # fail and jump out of here
else
token = m.match
push!(tokens, token)
str = SubString(str, length(token))
end
I don’t really understand your question in the OP (I don’t know well python/c# regexes, and you have some typos), but if I understand the title correctly, you can quote a string by enclosing it with \Q
and \E
, e.g. Regex(raw"\Q" * str * raw"\E")
.
Sorry that I’m not to match a specific string.
Wen writing string literals in our codes, we write a "
firstly, then followed by a sequence of characters, and finally we’d write another "
to end this process.
str = "<a sequence>"
Furthermore, when we want to represent a string contains "
s, we have to escape them in this way: "xx\"xx"
.
Note that our source codes are not special, I mean they’re still plain text.
So, how do programming language compilers parse the literal strings?
One way is using regular expression, which is quite mature and capable of expressing/matching escapes and quotations.
My problem is Julia didn’t work in this scope.
are you looking for this?
julia> str_regex = r"(...\"...)(.*)"
r"(...\"...)(.*)"
julia> test = "...\"... abc"
"...\"... abc"
julia> m = match(str_regex, test)
RegexMatch("...\"... abc", 1="...\"...", 2=" abc")
julia> m.captures[2]
" abc"
thank you, but not this.
Given a text file, whose content looks like
"this is a str\"ing"
Then read it into Julia, so how can we match it?
If I understand you correctly, you want to match a “string literal”, not match a “literal string”. (You could edit your post’s title.)
Languages are subtly different on what escaping rules apply; see e.g. https://github.com/JuliaLang/julia/issues/22926 for a nice discussion about a (fixed) corner case in Julia. Do you have any specific language you want to emulate?
1 Like
This one might be helpful to what you’re trying to achieve: String literals and regular expressions . After all, your question is about regular expressions much more than it is about Julia.
1 Like
your question is about regular expressions much more than it is about Julia.
Somewhat don’t agree. I know how to write this regex but don’t know why I fail this when with Julia.
You are right. I think what’s tripping you up is more likely than not what I linked above: behaviour for quotes following slashes inside raw strings. Specifically, you have \\"
inside your character class, and that’s interpreted differently in Julia than elsewhere.
Here’s the easiest way I found to get it to work. I tried to make it extra readable by using the x
modifier, which allows whitespace and comments as follows:
julia> str = repr("xxxxxxxxxx\"xxxxxxxxxxxxxx\"");
julia> regex = r"""
\G # match start
\" # opening quote
(?: # don't capture (better performance)
[^\"\\]+ # not a quote or a slash
| # or
\\. # an escaped character
)*? # ungreedy multiples of the above
\" # closing quote
"""x;
julia> match(regex, str)
RegexMatch("\"xxxxxxxxxx\\\"xxxxxxxxxxxxxx\\\"\"", 1="\\\"")
Hope that helps!
5 Likes
Awesome! Also thanks for solving this in such an elegant way!