Escaped sequences in `SubstitutionString`s


I tried to create a substiution string with Base.SubstistutionString() that contains an escaped sequence (\s), but, as expected, it does not work, since the relevant section in the docu says that:


Stores the given string `substr` as a `SubstitutionString` , for use in regular expression substitutions.

So, the question is whether anyone knows how to use an escaped sequence (e.g. a space character) in a substitution string that would replace another escaped sequence (e.g. a newline one).

Here is an example to use:

julia> str ="Whether an array is ordered can be defined either on construction via the ordered argument, or at any time via the ordered! function. The levels function returns 
all the levels of CategoricalArray, and the levels! function can be used to set the levels and their order. "

julia> str = replace(str, r"\n" => Base.SubstitutionString("\s"))
ERROR: syntax: invalid escape sequence
 [1] top-level scope
   @ none:1



I don’t think \s is a valid escape character at all, which should be the problem in your example. If you want a space character use the string " " or better even ' '.

Note that your example can be simplified to replace(str, '\n' => ' ', as no regex is used.

1 Like

Thanks for the response and the option of not even using Base.SubstitutionString() in the first place.

Also, \s can be used as a valid escaped sequence in a regular expression, in general, and in the exact same use of replace():

julia> replace(str, r"\s" => '|')

I guess then that the regular expressions type of escaped sequences differ from the ones on the right.


Yeah, the one on the left is a Regex type that can have all the usual regular expression escape sequences. \s on the right wouldn’t make sense because it means whitespace in general, not a specific character (for eg. Tab also matches \s). So if it appears in the SubstitutionString, it’s ambiguous which character you actually want there (space or tab).

The only escape sequences in the substitution string that make sense and are interpreted, are numbered ones like \1, \2, etc. (and the corresponding named \g<name> captures) that are used to refer to parts of the regex that have been captured on the left.


That clears all the fog! Thanks!