There is an opportunity for the Julia community to better utilize literal string notations, such as regular expressions, to enhance developer experience and improve code maintainability. Other sorts of literal strings include support for well-known string template formats, such as HAML. In specific fields of science, other notations may also be quite helpful, so rather than bugging the Julia community for a new syntax extension… it can be implemented straight-away and launched. For example, often people wish their own matrix format, this could be done as a literal string notation.
A requirement of these custom string literals is that they should not have standard processing, escaping or interpolation. To support these micro languages, Julia founders created the notion of a “raw string”, which string macros, such as regular expressions, inherit. Most of the time, the raw string processing rules do appear to meet their advertised mission, and pass along a string literal to the notation without escaping. This works so well, in fact, that users often get comfortable. Except when they trigger the conditions of the escaping rule that cause unexpected behavior. These rules are certainly not obvious to a newcomer, and they may even escape seasoned Julia developers. The common suggested work-around is to use the triple double-quoted form, but this is extremely heavyweight. In short, the existing mechanism is a burden to those who wish to build and effectively use string literal notations.
What would be an alternative? A paired Unicode character combination that is not currently used seems like it could be a useful way to create a useful “notation” syntax for Julia. There are many advantages.
-
The parsing algorithm is simple to understand. The Julia parser would know the notation has ended when the number of ending tokens matches the number of opening tokens. This is a learnable rule.
-
By using a Unicode character pair we are able to directly represent existing ASCII text protocols (single quote, double quote, backslash, forward slash, percent sign, etc.) without the need for escaping.
-
Paired notation can nest if the notation wished. This could be advantageous if one syntax format used another, or a recursive use of the same notation. Right now to have recursive use of notations, you must drop the notion of a string literal, and use a macro (which like triple double-quoted form) does increase verbosity.
-
Paired notation could span multiple lines without needing a “tripled” version of the indicators. This would help it be quite succinct.
-
The notation, by itself without any character in front of it, could be used for “raw strings”. This would complement double-quoted literals.
There are some design considerations.
-
Should it have escaping? Nope. The notation system itself could define its own escaping if it wishes to. For literals without a prefix, there is always double-quoted strings which can handle any code point with slash escapes.
-
The tokens should be visually distinct from single and double quotes, right-left brackets, and other operators to avoid visual confusion.
-
The token characters picked should be single-width so that they do not cause funky indentation in a courier font.
-
Ideally, they are marked as punctuation in Unicode, yet not conflict with usage patterns in common languages.
-
What’s not a design consideration is how to enter it, that’s a user interface issue. What’s important is that it is visually attractive.
So, what is a straw-man proposal? I’m not sure. Here’s one that has not been mentioned yet. However, I think finding a paired character combination can happen after we have general consensus on the idea.
'「': Unicode U+FF62 (category Ps: Punctuation, open)
'」': Unicode U+FF63 (category Pe: Punctuation, close)
Well, I don’t like it that much. However, it is visually distinct. It seems to be a cousin of Japanese quotations (「…」) that are not single width.
'⟩': Unicode U+27E9 (category Pe: Punctuation, close)
'⟨': Unicode U+27E8 (category Ps: Punctuation, open)
So… r⟨"[^"]*"![[⟩
?
Anyway. It seems kinda pointless to go though characters unless there’s even some interest in the path.