## Executive Summary
This is a request to add a string literal syntax using p…aired Unicode delimiters, perhaps ⟪ and ⟫, for use in non-standard string literal macros. This is proposed as an alternative *complementary* to, but not as a replacement for single or triple double-quoted raw strings.
## Description of Requested Syntax
- Paired delimiters `'⟪': U+27EA (Ps: Punctuation, open)` and `'⟫' (U+27EB, Pe: Punctuation, close)` are employed.
- Following a string macro name, such as `htl` for `@htl_str`, the open delimiter, `⟪`, begins a string using this syntax.
- The parser knows the extent of the string when the *corresponding* closing delimiter, `âź«`, is encountered.
- Nested pairs of these delimiters are seen as content. This could be done by tracking depth, the open delimiter increases the depth, while the close delimiter decreases the depth -- when the depth reaches zero, scanning is done.
- The entire extent of the scanned buffer, less the very first opening and the very last closing delimiters become the string value that is passed along to the string macro.
- There is no further complications with regard to scanning or processing of the string done by Julia. In particular, from Julia's perspective, there is no mechanism to escape content, interpolate content, or enter arbitrary Unicode code points.
- The interactive Julia environment could add `\>>` and `\>>` as a way to enter these paired delimiters.
Critically, this non-standard string literal syntax provides no mechanism to escape either of the delimiters, excepting that nested pairings are permitted within content. In particular, unbalanced use of the given delimiters are simply not valid syntax. Julia provides no mechanism to enter unbalanced delimiters within this syntax.
## Motivation
Let's define the term _notation_ to mean what is currently in the documentation as "non-standard string literal". The word notation is used by SGML and other standards for this concept.
For those doing data munging to interoperate with other systems, there is an opportunity for the Julia language to better utilize notations, enhancing developer experience and improving code readability. While developing HypertextLiteral (providing Julia-style string interpolation to HTML construction), I ran into 3 challenges with existing string "non-standard string literals" (notations).
1) They are not succinct. Since a great many subordinate syntaxes include the double quote character, use of the triple double-quoted form is the norm. The double quote character is already loud, tripling it on both ends... becomes a distraction. Note that this deficiency applies also to the use of `@macros()`.
2) They can be surprising. For cases where someone tries to use the single double-quoted form, novice users can be caught off guard with the raw_str escaping semantics and how it interacts with the backslash. As noted on the discourse forums, this escaping mechanism is not a "homomorphism over string concatenation", e.g. raw(a) * raw(b) != raw(a*b).
3) They can't be used recursively. If one would like to embed one notation inside another, a round of character escaping is required. This is unlike, for example, `@macros()` which nest perfectly well.
A promising option emerged on in the [ discussion forums](https://discourse.julialang.org/t/addressing-raw-string-syntax-and-semantics-for-julia-2-0/51343): the use of paired Unicode delimiters together with a matching parsing algorithm in place of traditional character escaping. You could think of this approach as bringing to string construction what we already know about function calling and data structures -- that they are seldom flat structures.
Specifically, we could employ `'⟪': U+27EA (Ps: Punctuation, open)` and `'⟫' (U+27EB, Pe: Punctuation, close)` as paired delimiters. This particular glyph combines a doubling (reminiscent of double quotes) with that of parenthesis (implying nestability). It's not perfect, but it is visually distinct in most fonts and in mono-space fonts appears to take the space of one regular character.
When Julia encounters a name token, say `htl`, followed by `⟪`, it would enter "notation" parsing state. Here it would keep track of the nesting depth, increasing depth when additional `⟪` are encountered, and decreasing depth when `⟫` is encountered. When the depth reaches zero, the entire span (less outer most tokens) of the string is sent unprocessed to `@htl_str`, and Julia parsing resumes. The REPL could add `\<<` and `\>>` shorthand to permit these two characters to be easily entered.
This addresses the three deficiencies noted above. This paired delimiter is much more succinct and visually attractive as compared to tripled double-quotes. The rule is unsurprising since there is no escaping, only the counting of depth, as one would find with parenthesized expressions. The rule naturally supports nesting, any construction using this method could be directly embedded as a subordinate notation. Moreover, if Unicode is used, these delimiters are unlikely to collide with those used in traditional systems, and if they do, so long as those systems use only paired form, there is no difficulty.
## What about content having a non-paired delimiter?
This is a two part answer. Primarily, how to avoid the chosen delimiter pair becomes the notation's concern, not Julia's. For example, HTML has ampersand escaping, so the opening delimiter could be written as `⟪`. URLs use percent-encoding. Traditional double-quoted syntax (e.g. `"\u27EA"` for the opening delimiter) could be used by a Python notation. For example, to encode a non-paired opening delimiter, a use of this feature might look like...
`htm⟪<html><body>We start these string literals with <code>⟪</code></body></html>âź«`
Asking a notation to provide its own delimiter escaping is not without precedent. In web pages, embedded Javascript begins with `<script>` and ends when the HTML parser encounters `</script>` -- with no escape mechanism. Javascript developers who need to represent this sequence within their logic use regular double quoted strings, with the delimiter encoded as as `"<\/script>"`.
As a fallback, for notations such as `@raw_str` which lack such features, if the user must include a non-paired delimiter, they could use the existing raw string syntax which would not go away. Alternatively, they could be creative and build their string in chunks, using this syntax for most of the content and concatenating with regular double quoted strings for the non-paired delimiter. This proposed syntax aims to be *complementary* to existing approaches and represents different set of sensibilities.
## Increased Usability
With this feature, a regular expression to detect quoted strings might be written as `r⟪(["'])(?:\\?+.)*?\1⟫` with no need to triple double-quote or worry about slashes. Moreover, other notations could embed regular expression notation without having to worry about a round of additional escaping.
I believe these rules would permit developers integrating with foreign data producers and consumers to create their own succinct, unsurprising and nested function-like transformations that mix native languages within a Julian data processing context. Here is an example.
```
render(books) = htl⟪
<table><caption><h3>Selected Books</h3></caption>
<thead><tr><th>Book<th>Authors<tbody>$(htl⟪
<tr><td>$(book.name)<td>$(join(book.authors, " & "))
âź« for b in books)</tbody></table>âź«
```
In HypertextLiteral, the functionality above is currently written as...
```
render(books) = @htl("""
<table><caption><h3>Selected Books</h3></caption>
<thead><tr><th>Book<th>Authors<tbody>$(@htl("""
<tr><td>$(book.name)<td>$(join(book.authors, " & "))
""")) for b in books)</tbody></table>""")
```
While one might argue that the latter form is particularly fine, this example works because HTL uses Julia's syntax and excellent parser. Notations defined outside of Julia's ecosystem won't have this luxury.
In conclusion, a succinct, unsurprising, and nestable way to incorporate foreign notations as Julia expressions will open up opportunities for innovative uses of Julia's excellent macro system and dynamic programming environment. What are the costs? A relatively simple parser rule and integration with existing string macros and... the assignment of a Unicode pair.