Addressing raw string syntax and semantics for Julia 2.0?

This discussion (as of December 17th) is now arguing a paired delimiter (such as double-angle brackets ⟪...⟫) could offer improved semantics for raw strings, used by regular expressions and other string notations, by negating the need to have complex escaping rules. In this case, the raw string would end when its matching pair is encountered, allowing for nested strings. There will be some strings that cannot be represented, but that would be acceptable for the design since regular double quoted strings have that feature.

Dec 6th this was titled, “Retrieving single-quoted syntax for general purpose use in Julia 2.0”. The original posting follows. Recovering single quote syntax has been resoundingly dismissed. However, the challenge with raw string semantics remains.


One of my disappointments with Julia is that single quotes are reserved for characters, as opposed to being a more generally useful alternative to double quoting. In SQL, for example, single quotes can be used – where doubling up on a single quote is the only form of escape.

"She said: \"His name is O'Brien\"" == 'She said: "His name is O''Brien"'

I realize in early development, having a succinct mechanism to express characters was convenient, but it’s not particularly common for application or even scientific programming. Instead of a very accessible single-quote syntax, we have @raw_str semantics, which are hard to grok.

There is a brave way forward. We could introduce a string literal for characters, let’s call it @c_str anytime soon…

julia> macro c_str(ch)
           s = unescape_string(ch)
           @assert length(s) == 1
           return s[1]
       end
@c_str (macro with 1 method)

julia> c"\n"
'\n': ASCII/Unicode U+000A (category Cc: Other, control)

Over time, existing code could be migrated to use this. As someone doing some parsing myself, this isn’t such a huge burden – it’s literally one extra character to type, and this macro could be carried out and type-checked at compile time. Then, one day, we could deprecate use of single-quotes for characters. Then in Julia 2.0, we could use single quotes as a real alternative to places where we don’t want slash escaping, for example, in regular expressions.

6 Likes

I agree single-quotes are such a useful bit of syntax that using it for characters seems like a big waste when c"a" could do instead. But it’s less clear to me that using it as an alternative to " just to avoid escaping quotes in strings is worth it, since you can use triple quotes for that already.

3 Likes

In triple quotes you still have to escape the slash (\), hence it’s not going to solve the core problem with regex. Once you’re reclaimed single-quotes (where only special character is ' which is doubled), you could use regular expressions r'\s+' without having to master a clever set of quote/slash escaping rules (see @raw_str). Single quoted strings would be necessarily limited, they would not permit arbitrary code points to be represented in ASCII form, nor would they permit string interpolation (unless you escaped $ by doubling it). But being the lesser cousin also makes it more approachable for a great many use cases.

1 Like

You can always just write a string macro for whatever semantics you prefer.

Arguably needing a lot of unescaped "s in string literals is not very common either in scientific programming.

Personally, I doubt that the benefit (if any) is worth a breaking change. Cf

5 Likes

It seems to me that the usage for single quotes that you are suggesting is extremely niche, I guess I have had use for nested quoting a handful of times in my life, while using them for characters is something I do frequently.

Perhaps a poll or something could clarify what is the most common use, but for me, this would be to waste good syntax on something very rare.

6 Likes

I think this could be convenient not just for nested quoting, but generally as an alternative to “raw string”. I, for one, use raw strings (including string macros) much more than character literals. I think the current raw string escaping rules are limited by the existence of triple-quoted strings (which would clash with having "" as the way to escape "), and ' would not have this issue.
But I’m not sure if it’s worth the breaking change either.

1 Like

If single quotes could be used anywhere that double quotes can, then it would also be necessary to get rid of postfix ' for adjoints, which is possible but would make this even more disruptive. In general, doubling a quote character as a way to escape it is not IMO any better than backslash escaping it, and leads to situations where you have to count how many quotes occur in a row in order to interpret something, so I don’t see how that’s a win.

I also find the choice between double and single quote strings in languages that support both pretty irritating: people seem to use them randomly and arbitrarily, and different people pick a different one as their default, or worse still, they alternate. Having one main way to express strings seems like a benefit rather than a drawback from that perspective. Sure, there’s the occasional escaping needed, or a special string macro, but there’s no disagreement about what the standard way to write strings is.

65 Likes

As much as I love cutting down on keystrokes, one big source of confusion I could see in this is in Julia code that wraps SQL, as is common in many data science workflows. consider this for example:

using  DBInterface

stmt = DBInterface.prepare(
     con,
     "SELECT * FROM Table WHERE Name LIKE 'julia'"
)

Compared to this, if Julia implemented single quotes with a backslash escape:

using  DBInterface

stmt = DBInterface.prepare(
     con,
    'SELECT * FROM Table WHERE Name LIKE \'julia\''
)

Or perhaps even more confusing, using consecutive single quotes to escape:

using  DBInterface

stmt = DBInterface.prepare(
     con,
     'SELECT * FROM Table WHERE Name LIKE ''julia'''
)

The fix for the less natural bottom syntaxes in a language like R, for example, would just be to use double quotes to wrap SQL statements - but that means you are either:

  • Using single quotes in some places and double quotes in others to represent the same type, which can be confusing
  • Counting on others who look at your code to know your style rules, e.g. Only SQL gets double quotes, or some other rule like that
  • You use double quotes everywhere, which negates the point of the whole change

Thank you for the lovely response @StefanKarpinski. I didn’t know about adjuncts.

@Derek_Vetsch – This isn’t a suggested replacement for double quoted strings, it’s a suggested replacement for @raw_str which has some rather non-obvious rules. Then again, we probably couldn’t drop @raw_str at this point, so it’d just be one more permutation to know.

2 Likes

What are the non-obvious rules of raw strings?

6 Likes
help?> @raw_str

Create a raw string without interpolation and unescaping. The exception is that quotation marks still must be escaped. Backslashes escape both quotation marks and other backslashes, but only when a sequence of backslashes precedes a quote character. Thus, 2n backslashes followed by a quote encodes n backslashes and the end of the literal while 2n+1 backslashes followed by a quote encodes n backslashes followed by a quote character.

julia> println(raw"\ $x")
# \ $x
julia> println(raw"\"")
# "
julia> println(raw"\\\"")
# \"
julia> println(raw"\\x \\\"")
# \\x \"
julia> println(raw"\\\"x")
# \"x

As I understand, these rules apply equally to @r_str and any other string macro.

For a custom string macro to be exempt from the escaping rules above, it would need to have a way to hook into the parsing process. So, unless I’m misunderstanding something, I don’t see how I could write a string macro that doesn’t inherit these escaping semantics.

2 Likes

Without some form of escaping, there would be strings that could not possibly be input via string macros. The rules are basically the minimal escaping rules that allow every possible string to be input. If you escaped quotes with another quote character then what would the rules be? One quote closes the string, so you’d need two quotes to represent an actual quote, three quotes for a quote at the end of the string, etc. It’s basically the same rule except that the backslash rule only needs to be applied when preceding a quote. Escaping is complex but unavoidable.

5 Likes

End users now have a choice of how to represent string data. They can use regular or raw strings – neither of which do well with the double quote character (for those you have triple double-quoted variants!). Single quoted strings provide users with visually light-weight style having a simple escaping rule that completely avoids both the double quote and the slash. Even better that this lesser cousin would be unable to encode arbitrary code points – sometimes worse is better. I believe it is a win to recover the single-quoted character for this purpose in Julia 2.0

Obviously, I can also accept that this is considered too disruptive for Julia 2.0. Thank you for listening.

Maybe I’m misunderstanding, but I don’t think that’s right. If a string can be enclosed in a single quote ', you can use two single quotes ('') consistently to represent a single quote regardless of the position in the string (provided no other form of quoting is supported, and there are no triple-quoted (''') strings). Of course, if it happens to be in the end, you end up with three quotes in a row, but it’s still the same rule, unlike @raw_str, which is forced to use different, non-obvious rules at the end of the string.
I think the same would be possible for double quotes, if it weren’t for triple-quoted strings, and the existence of \".

That’s precisely what he said. It’s the same scheme as for @raw_str, except that you have to apply it everywhere consistently, whereas for @raw_str it slightly changes at the end of a String.

1 Like

My take on this is you are sacrificing characters making them “more difficult” to declare in favor of more ways to declare strings. Triple quoting already gives you unescaped quotes as long as you don’t need 3 or more in sequence. Someone who works with characters more than strings might take offense.

We could always go for something that will annoy everyone equally:

Character: c'"'
Character: c"'"
String: s"a'bc"
String: s'a"b"c'

Granted that might be hard to read, but at least EVERYONE is inconvenienced equally…

I feel like this change would be liked by people who create lots of static strings with double quotes, hated by people who use characters extensively, and mostly indifferent by the rest of us. Although @StefanKarpinski did bring up a good point about reading code where the author switches back and forth in how they declare strings as being a negative.

3 Likes

I never understood why languages invest in various complicated forms of literal string syntax.

If the string is short, any sane mechanism will do. Just keep it simple, like Julia’s current solutions.

If the string is long (or you have many such string etc), it should be treated as data and externalized, eg as an artifact.

4 Likes

If the string is long (or you have many such string etc), it should be treated as data and externalized, eg as an artifact

I remember the good ol’ days, when you would just hard-code your PHP script with user “admin” password “admin”! I liked the convenience of single- or double-quoting, but the convenience features also led to PHP’s undoing under the weight of serious webapps. I can understand OPs concern with regex, but I’m not sure how much single-quoting will help. Regex’s always seem difficult to read or write, and full of escapes no matter what.

2 Likes

I heartily agree, reading arbitrarily used single or double quotes hurts my mental parsing. The whole philosophy behind the current standard is one of the great things about julia.

4 Likes

The big and unusual problem with the escaping logic of Julia’s raw strings (and Microsoft’s argv parser syntax, from where it was copied) is that, in mathematical terms, it is not a homomorphism over string concatenation, unlike almost any other escaping mechanism that I have ever encountered.

In other words: the problem with escape_raw_string() is that it is not true that for all strings a and b the following equation holds: escape_raw_string(a) * escape_raw_string(b) == escape_raw_string(a * b).

6 Likes