Using `replace()` with unicode dot

I can’t reproduce your example:

julia> replace("ẋ′", "ẋ"=>"dx", "′"=>"_p")
"dx_p"

However, I have a good guess for what happened on your computer.

I’m guessing that you are having problems due to differences in Unicode normalization. The difficulty is that there are two “canonically equivalent” ways to express the "ẋ" that consist of different sequences of characters. You can can use a single character U+1E8B 'ẋ', or you can use an ordinary ASCII 'x' followed by U+0307 “combining dot above”:

julia> import Unicode

julia> s1 = Unicode.normalize("ẋ", :NFC) # NFC normalization gives the 1-char version
"ẋ"

julia> s2 = Unicode.normalize("ẋ", :NFD) # NFD normalization gives the 2-char version
"ẋ"

julia> s1 == s2
false

julia> collect(s1)
1-element Vector{Char}:
 'ẋ': Unicode U+1E8B (category Ll: Letter, lowercase)

julia> collect(s2)
2-element Vector{Char}:
 'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
 '̇': Unicode U+0307 (category Mn: Mark, nonspacing)

Probably you are using the NFC version in one place and an NFD version in another. Either be consistent in how you enter "ẋ" or explicitly call Unicode.normalize before doing the replace call.

I’m guessing that the reason your example worked for me is that, as @mbauman commented in another thread, some browsers automatically normalize Unicode when you paste into their text-entry box to post on discourse.

PS. Note that this is not in any way specific to Julia. The same issue of multiple representations for the “same” string appears in any language supporting Unicode text.

5 Likes