Addressing raw string syntax and semantics for Julia 2.0?

I do miss in Julia alternatives to the double-quote string-literal syntax, which would come in very handy when the strings one writes contain a lot of double quotes and backslashes themselves. This is particularly common in web application programming, where one regularly outputs bits of other languages, such as HTML, SQL, JavaScript, LaTeX. A wider variety of alternative string delimiters would be nice. A simpler non-interpolating string literal syntax than raw strings would also be great, where one can use $ unescaped.

Perl has the extremely convenient nesting q{...} and qq{...} alternatives for its '...' and "..." non-interpolated and interpolated string literals, and I use these all the time when outputting languages that use a lot of single and double quotes. Julia’s raw strings don’t really work nearly as well for that, and I find them mentally quite taxing to use. (I may be biased, having written a lot of unit tests for string escaping functions recently.)

Given Julia’s enthusiasm for Unicode characters, how about allowing “...” as a syntactic alternative to "..."? Or something involving other paired delimiters, like in Perl?

2 Likes

That’s nice. Within “...“ would you argue for simply no escaping, or doubling? Regardless, it’s even better than single quoted option since it’s more tedious to write, meaning people won’t prefer it to the plain double quote (addressing some of the concerns in this thread). We could then let this be used with any sort of string macro… ie, r“...“ fo r regular expressions /w lots of slashes and double-quotes :wink:

Lots of us have this bias. It’s a real problem. Especially for those dealing with foreign file formats of various sorts where you have to worry about escaping – raw string fails in the most subtle ways.

1 Like

Note that you’re already getting it wrong — it presumably should be “...”, and the need to match the directions of curly quotes would make them especially tricky for many people to type in code editors. They are also difficult to distinguish in some fonts. (In word processors, people are used to “smart quotes” that automatically convert "..." to “...”. Right now, you can tab-complete \quotedblleft and \quotedblright in Julia, which I suppose we could shorten to \ldq and \rdq or something.)

3 Likes

I’ve actually had “” and ‘’ on the {} keys with AltGr (right Alt key) for many years. I type these without any smart quote function regularly in normal text. macOS has them there by default, on Linux I need a small .Xmodmap config file.

1 Like

Yes, I have a Mac laptop and that’s how I typed them. But I’m not sure how widely known these shortcuts are since people are used to smart quotes, and on Windows the default shortcuts are much more cumbersome.

1 Like

Right. I didn’t recognize them as smart-quote pair – my vision is quite poor and the default font in my web browser has a small glyph with only a faint distinction. That said, on the console, the difference is pronounced.

  1. They don’t conflict with common delimiters in other languages – slash, single or double quotes.

  2. They have nesting built in. This would be great for various DSLs that use string macros, letting them nest cleanly, without escaping. For example in an htl“<html-content/>” interpolated string macro one could have $(css“...”) micro-format.

  3. They don’t seem to be legal characters in existing Julia identifiers.

  4. They work within “backticks” for documentation.

  5. I’m sure the parser could handle any sort of mismatching… just like missing an end marker.

  6. They are hard to confuse with stright double-quotes.

  7. Writing them is also non-trivial, you might think of this as a problem, but, as noted, many on this thread didn’t want them to be used easily… so I mark that as a feature.

I love it. This would be an excellent “raw string” representation. The main downside is that some fonts may not distinguish between chirality… but that’s a minor nuisance. If you mess up it’s extremely unlikely the program will even run, not that it will run with the wrong semantics.
One more downside… not being ASCII, the UTF-8 representation will cause some parsing complications. I don’t see that as a showstopper though?

2 Likes

Perhaps the french-style «...» are less easily confused? (These are opt-\ & opt-shift-\ on the mac keyboard.)

5 Likes

:slight_smile:

1 Like

Some other languages (e.g., German) use them the other way round: »...«

Oh right. And according to this authoritative source, just about every combination occurs somewhere: Map of quotation marks in European languages

Unicode has opinions:

2 Likes

If you are willing to type Unicode in any case, you could just designate whatever character you like as a placeholder for " and leave \ alone, eg

macro q_str(str)
    map(str) do c
        if c ≡ '“' || c ≡ '”'
            '"'
        else
            c
        end
    end
end

julia> q"foo“bar\baz"
"foo\"bar\\baz"
1 Like

This doesn’t help to mitigate the complexity of the backslash rules. For example: a series of backslashes at the end of the string still is considered to be escaped, but not a series of backslashes elsewhere:

julia> q"a\\"
"a\\"

julia> q"\\a"
"\\\\a"

Surprised?

You cannot escape the tyranny of the raw-string syntax via a macro: all string macros are first parsed as raw strings, and by the time your macro is executed, the raw string parser has already done its work, i.e. has treated sequences of backslashes before quotation marks and at the end of the string differently from any backslashes elsewhere. And it has decided, based on these rules, where your string ends. You cannot use macros to define a different algorithm for deciding where a string literal ends because the parser does that and has already done its work by the time your macro runs. That’s why I think an alternative string-literal syntax in the parser, with a terminating character that is less commonly used (especially in non-balanced fashion) in other languages, would be useful.

3 Likes

You can replace an arbitrary character with backslash, too, along the same lines.

I understand that you would find it useful, but I think that the bar for new syntax should be higher in Julia: generally, it should have a more generic application than a rather niche use case that can be solved in other ways or just handled directly with escaping.

2 Likes

This isn’t that realistic. Often times you’ve got a small escaping problem in an application. So you use rawstrings… Now you have two problems (three if you use a regex, right?). Not that you can’t overcome these… it’s just mentally taxing, and more often than you’d imagine – you stub your toe, waste time, and annoy your more intelligent coworkers. Although, with raw-strings, you don’t stub your toe all the time… it’s like a basement stair case with a broken step that moves on you, as the rules are so random that the outputs are truly surprising. I take it that you’re suggesting that every application developer working with string escaping either get use to stubbing their toe, or build, document, and maintain their own broken stair traversal system? Or that I download and integrate a 3rd party library? Are those libraries going to integrate with all the string macros out there, or is this more glue that has to be written by our intrepid developer?

Rawstrings, the basis for string macros, are a ill-conceived feature. They should be removed in Julia 2.0 and replaced with something more sensible. If you look at the feature design thread when the arcane rules were agreed upon, you can see the Julia founders were really struggling to find the least worst option. Their decision making happened to not take into account four additional perspectives: a) it’s not important that you be able to encode every possible string using the system, since you always have double-quoted strings for that purpose, and obviously very clever substitution tricks you bring up in this thread, b) with paired delimiters, nesting … the most common need for self-representation, is handled automatically without mental stress, c) composability is important, and this means being a homomorphism over string concatenation, d) double quoting is already complicated, what is needed is a less complicated alternative to compliment it.

We could phase in reasonable replacement for rawstring using smart quotes – the rules would be rather simple: there is no escaping, the string ends by counting the matched quotes. Then, one could use this format directly (no raw prefix needed). Moreover, string literals could use this as well. The command line could be updated to have reasonable short-cuts (this is already a ticket). Then, in Julia 2.0, raw strings could be deprecated. Then, eventually, removed… saving thousands and thousands of hours of application developer frustration.

(as for my inability to distinguish between start/end glyphs… I can buy glasses or use a font with more distinct representation of these characters; regular paired quotes with built-in ways to type them are better than getting clever with alternative glyphs).

1 Like

Personally I would just use escaping as is and move on.

Surface syntax is often the most discussed, but at the same time least interesting feature of Julia — it is not where the magic is.

A lot of things don’t have a convenient literal syntax, just a constructor, and this is fine. Strings are not really special here: Julia does not have dedicated syntax for something as basic as Dict, Array{T,N} for N >= 3, and a lot of other built-ins. Occasionally people propose that these get their own dedicated syntax, but I am not sure they would be worth it.

Again, if I find that I need to input tons of literal objects, I would just take a step back and figure out a workaround (which would be application-dependent). The context of this problem is still unclear to me, without it is hard to propose a specific solution.

5 Likes

You are simply not listening. The problem was articulated in the opening post, @mgkuhn’s posts, and in the one you responded to. The raw string semantics are a place where you can stub your toe, and it hurts.

This has little to do with surface syntax. You’re welcome to your opinions, but please stop assuming that your framing of the challenge is the only valid one.

2 Likes

Let me rephrase: I don’t doubt that escaping is somewhat inconvenient, but I don’t understand why it is a major problem when used occasionally.

The context I am missing here is a specific (real life, practical) example where one would need to escape a lot. I am curious if alternative solutions could alleviate it, before considering breaking changes to Julia. I think that this is a reasonable request for any proposed change to the language.

5 Likes

Why? Under what circumstances would you need to compose raw string escapings? Normally, one composes the underlying unescaped strings, and if you need to programmatically compose escaped strings you can use the regular (non-raw) escaping ala escape_string.

If you’re not worried about encoding every possible string, then just don’t use raw strings for anything ending with a backslash or a quote. The complicated rules go away. If you want to catch this case, define your own myraw"..." macro that is identical to raw"..." but throws an error if the resulting string ends in a backslash or quote.

It sounds like the real issue here is that you have lots of literal strings containing ", and so you want a literal string format that doesn’t require this to be escaped. This isn’t unreasonable, but I don’t think we should use single-quote strings for the reasons noted above; “...” is not crazy for this use-case (and would be non-breaking).

7 Likes

Isn’t it «...» (“französiche”), see Punctuation Marks Part 1 (thoughtco.com)

Just 2 days ago I was trying to normalize a test result between two versions of Julia. I wanted to match “@htl_str "” and replace it with “htl"” since the serialization of string macros changed in Julia. Of course, even though I was aware of raw string semantics within my regular expression, this was even after this particular thread was started – and this bit my ass anyway and cost at least an hour of my time. The frustrating thing about raw_str semantics (occurring in regular expressions) is that they work the way you expect most of the time, then like a broken stair that moves on you… ouch. In the end, rather than trying to figure out the regex that would do this, I just avoided the situation entirely (and decided to not test this output). This isn’t rare. Earlier this year, I was fixing malformed FHIR data from a commercial source and ran into this issue twice, both times puzzling over the results in amazement before having to ask someone smarter than me to help. I hit raw string issues in the most unexpected ways, and it is mentally taxing.

Anyway, it’s fine if Julia doesn’t want to fix this wart. However, claiming that it’s just syntax or just some preference of mine isn’t accurate – it’s a place where people stub their toe.

1 Like