Impossible Raw String Encoding?


#1

dear julia experts— is it impossible to encode a quoted quote into a raw string? For example, ("\"abcd\"") contains two of them:

println("WANTED: (\"\\\"abcd\\\"\")")
println(raw"NO: (\"\"abcd\\\")")
WANTED: ("\"abcd\"")
NO: (""abcd\\")

PS: It’s not important. It’s more a curiosity.


#2

Is this what you want?

julia> x = IOBuffer(raw"\\\"abcd\\\"")
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=8, maxsize=Inf, ptr=1, mark=-1)

julia> dump(x)
Base.GenericIOBuffer{Array{UInt8,1}}
  data: Array{UInt8}((8,)) UInt8[0x5c, 0x22, 0x61, 0x62, 0x63, 0x64, 0x5c, 0x22]
  readable: Bool true

#3

hi scott—no, I just want to print it. I should have omitted the IOBuffer altogether—it was confusing. will change in questions.


#4
help?> @raw_str
  @raw_str -> String

  Create a raw string without interpolation and unescaping. The exception is that quotation marks still must be escaped. Backslashes escape both quotation marks and other
  backslashes, but only when a sequence of backslashes precedes a quote character. Thus, 2n backslashes followed by a quote encodes n backslashes and the end of the
  literal while 2n+1 backslashes followed by a quote encodes n backslashes followed by a quote character.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> println(raw"\ $x")
  \ $x
  
  julia> println(raw"\"")
  "
  
  julia> println(raw"\\\"")
  \"
  
  julia> println(raw"\\x \\\"")
  \\x \"

#5

It’s not what I am getting here:

julia> println(raw"\\\"")
\\"

you seem to be getting \". are you using 0.6.2?


#6

Is this what you want?

Julia-0.6.2> println("\\\"abcd\\\"")
\"abcd\"

#7

yes, and it is like my original question example, not a raw string but a non-raw ordinary string.


#8

This behavior was introduced in 0.7: https://github.com/JuliaLang/julia/pull/24621

In Julia 0.6, some strings are impossible to write as raw"..." strings; you have to use an ordinary string literal with backslash escapes.


#9

thank you. understood. (looking forward to 1.0!)


#10

How could you write @raw_str("\t") without workaround?

BTW. answer to this question imply possibility to write properly something like this:

replace("abc", r"(b)" => @s_str("\\1\t"))  # replace ` group => group * "\t" `

Edit:
Not necessarily imply. Because replace could probably work with next representation too:

julia> s"\1\t"
s"\\1\\t"

But I am not sure if it would be best solution to the problem.


#11

raw"\t" already works. You only run into trouble for backslashes at the end of the string, since it is ambiguous whether they are intended as a backslash or as an escape of the final ". In any case, you can always just write an ordinary string literal with backslash escapes: e.g. raw"\t" is equivalent to "\\t".

You never need the raw string syntax — it is just convenient sugar for strings like raw"C:\foo\bar\baz" where you would otherwise need lots of extra backslashes.


#12

You wrote: In Julia 0.6, some strings are impossible to write as raw"…" strings".

This is true in Julia 0.7 too…

It could be disputable. Problem described in SO seems to be still solvable only by using hack.

But you are probably right because it is more problem of SubstitutionString than raw string. Although possibility to escape chars in raw string syntax would help too. (I don’t propose it now)

See:

# I am trying to create MWE here. 
# We want to add \t to found group in replace 
#   So "a\t" has to change to "a\t\t" 
#     (or "b\t" to "b\t\t" ) 

julia> grp = r"([a-z]+\t)"

julia> replace("a\t", grp => "\1\t")  # this is NOK
"\x01\t"

julia> replace("a\t", grp => "\1\t")  # this is NOK
"\\1\t"

# next two examples show where we NEED escape \t 

julia> replace("a\t", grp => s"\1\\t") # this is NOK
"a\t\\t"

julia> replace("a\t", grp => s"\1\t") # this is ERROR

So if we want to use “\1” regex syntax we could not use String, we need Base.SubstitutionString. But there is problem because we need some kind of mix raw string syntax with string syntax …

This hack works but it looks not like “fresh approach”:

julia> replace("a\t", grp => @s_str("\\1\t"))  # using hack is OK
"a\t\t"

We very probably need to fix functions working with SubstitutionString. (replace doesn’t understand ‘\t’ in substitution string)

And for example this is not very nice too:

julia> s"\1\t"
s"\\1\\t"

#13

Not for raw strings: https://github.com/JuliaLang/julia/pull/24621

It could be disputable. Problem described in SO seems to be still solvable only by using hack.

I don’t see this as a hack. As you point out on SO, you can always do Base.SubstitutionString("....some literal string....") with as many backslashes as necessary, or @s_str("...") as you note above. For example:

replace("a\t", r"([a-z]+\t)" => Base.SubstitutionString("\\1\t"))

Sometimes you need to call the raw constructor in order to have fine control over what is escaped. This is also important for constructing the string programmatically (not by a literal).

That being said, in cases where it can be done without ambiguity, it would be great to extend the range of escapes recognized by s"..." etcetera.


#14

readme says: “Raw string literal escaping rules have been changed to make it possible to write all strings.”

Maybe it is OK and I just don’t understand that wording. I understand that sentence that this is possible:

"\t" == raw"<what to write here?>"

Which I am afraid is not. But this I believe this could be true:

repr("<any string>") == raw"<we are able to write this>"

So raw strings could be good and we just need to fix substitutional strings.

Or abandon this bitter syntactic sugar? @ScottPJones how do you do replace in your library? Do you use Base.replace?


#15

Currently, I just use Base.SubstitutionString, although that may change (I have had to make my own types to replace Regex, RegexMatch, and RegexMatchIterator, to support string types other than String)