Symbol to String

Mirsad_Cosovic · December 28, 2022, 9:31am

I’m trying to convert a special Symbol to a String, but I’m having some trouble. Is the following behavior usual and why?

julia> String(:Ω) == "Ω"
false

jar1 · December 28, 2022, 9:48am

Greek omega and Ohm sign have different unicode characters so they’re not equal under == even though they look the same.

Mirsad_Cosovic · December 28, 2022, 9:54am

I understand that, but why unicode is different in those two cases?

cormullion · December 28, 2022, 10:29am

The Unicode Consortium say:

For compatibility purposes, a few Greek letters are separately encoded as symbols in other character blocks. Examples include U+00B5 μ   in the Latin-1 Supplement character block and U+2126 Ω   in the Letterlike Symbols character block. The ohm sign is canonically equivalent to the capital omega, and normalization would remove any distinction. Its use is therefore discouraged in favor of capital omega. The same equivalence does not exist between micro sign and mu, and use of either character as micro sign is com- mon; for Greek text, only the mu should be used.

which I think is saying “Don’t use U+2126 Ω (or \ohm in Julia) - use U+0309 (\Omega in Julia)”.

uniment · December 28, 2022, 10:51am

Do they also say anything about why we can’t have subscript \_b, \_c, or \_d, but instead we can have \:turtle: and \:person_in_steamy_room: ?

Per · December 28, 2022, 11:36am

The reason why :\ohm<tab> gives the Omega symbol instead of ohm is that symbols are treated like variable names in Julia, and thus it makes sense to normalize them.

stevengj · December 28, 2022, 12:27pm

The thing to realize is that the Unicode strings stored internally for Julia symbols (e.g. variable names or quoted symbols) are automatically normalized to canonical form.

So, even if you type :Ω using the Ohm symbol U+2126 (e.g. via :\ohm<tab>), it will get normalized to Omega U+03A09:

julia> collect(String(:Ω)) # Ω is Ohm U+2126 (\ohm<tab>)
1-element Vector{Char}:
 'Ω': Unicode U+03A9 (category Lu: Letter, uppercase)

Another example would be accented Latin characters like ë, which often have two canonically equivalent representations (either a single special character or an unaccented character followed by a “combining” accent character), but you don’t want that to correspond to different variable names depending on how you type it (e.g. different input systems). Canonicalization (technically, NFC normalization) removes that distinction.

This is explained in the Julia manual:

Some Unicode characters are considered to be equivalent in identifiers. Different ways of entering Unicode combining characters (e.g., accents) are treated as equivalent (specifically, Julia identifiers are NFC-normalized). Julia also includes a few non-standard equivalences for characters that are visually similar and are easily entered by some input methods. The Unicode characters ɛ (U+025B: Latin small letter open e) and µ (U+00B5: micro sign) are treated as equivalent to the corresponding Greek letters. The middle dot · (U+00B7) and the Greek interpunct · (U+0387) are both treated as the mathematical dot operator ⋅ (U+22C5). The minus sign − (U+2212) is treated as equivalent to the hyphen-minus sign - (U+002D).

cormullion · December 28, 2022, 1:34pm

@stevengj Any progress on your proposal?

sostock · December 28, 2022, 1:58pm

However, this does not happen if the the Symbol constructor is used (instead of typing :Ω):

julia> collect(String(Symbol(Char(0x2126))))
1-element Vector{Char}:
 'Ω': Unicode U+2126 (category Lu: Letter, uppercase)

stevengj · December 28, 2022, 2:02pm

That’s right — if you use the Symbol constructor then you can make a Symbol from any Julia string even if it is not a valid identifier, such as Symbol(" ") (as long as the string doesn’t contain \0). Because of this it takes the strings literally as-is, with no normalization.

Topic		Replies	Views
String conversion from Symbol with Unicode does not yield a string, which is intended to be the same New to Julia question , bug	6	767	December 5, 2020
General subscripts in symbols? General Usage unicode	4	9269	December 26, 2021
Convert to string without losing the colon character for Symbol variables General Usage question , strings	12	1605	June 27, 2023
Function to string General Usage	17	408	May 18, 2023
Weird behavior for input of special symbols in the Julia REPL under (K)ubuntu Linux New to Julia unicode	4	287	February 15, 2023

Symbol to String

Related topics