Using `replace()` with unicode dot

Brad_Carman · December 28, 2023, 2:37pm

Is there a way to get the “\dot” unicode character to be replaced?

julia> replace("ẋ′", "ẋ"=>"dx", "′"=>"_p")
"ẋ_p"

stevengj · December 28, 2023, 2:50pm

I can’t reproduce your example:

julia> replace("ẋ′", "ẋ"=>"dx", "′"=>"_p")
"dx_p"

However, I have a good guess for what happened on your computer.

I’m guessing that you are having problems due to differences in Unicode normalization. The difficulty is that there are two “canonically equivalent” ways to express the "ẋ" that consist of different sequences of characters. You can can use a single character U+1E8B 'ẋ', or you can use an ordinary ASCII 'x' followed by U+0307 “combining dot above”:

julia> import Unicode

julia> s1 = Unicode.normalize("ẋ", :NFC) # NFC normalization gives the 1-char version
"ẋ"

julia> s2 = Unicode.normalize("ẋ", :NFD) # NFD normalization gives the 2-char version
"ẋ"

julia> s1 == s2
false

julia> collect(s1)
1-element Vector{Char}:
 'ẋ': Unicode U+1E8B (category Ll: Letter, lowercase)

julia> collect(s2)
2-element Vector{Char}:
 'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
 '̇': Unicode U+0307 (category Mn: Mark, nonspacing)

Probably you are using the NFC version in one place and an NFD version in another. Either be consistent in how you enter "ẋ" or explicitly call Unicode.normalize before doing the replace call.

I’m guessing that the reason your example worked for me is that, as @mbauman commented in another thread, some browsers automatically normalize Unicode when you paste into their text-entry box to post on discourse.

PS. Note that this is not in any way specific to Julia. The same issue of multiple representations for the “same” string appears in any language supporting Unicode text.

Brad_Carman · December 28, 2023, 2:53pm

Thanks! I knew I would learn something new with this question

Topic		Replies	Views
UnicodeREPL.jl - Type any Unicode character in the REPL Package Announcements repl , unicode , codepoint	6	1080	July 4, 2024
String conversion from Symbol with Unicode does not yield a string, which is intended to be the same New to Julia question , bug	6	854	December 5, 2020
Failing to show unicode character in julia file General Usage vim	2	1301	February 24, 2021
Tab completion for umlauts, and some musings about unicode characters Internals & Design	6	2039	February 26, 2018
Replace non ascii char General Usage	2	1607	July 9, 2021

Using `replace()` with unicode dot

Related topics