It seems like you are using the word “encoding” in a non-standard way — it sounds like you mean the Unicode codepoint value. e.g. U+0041 is the codepoint for the ASCII character 'A'
, not an “encoding”.
In contrast, UTF-8 is an “encoding” of Unicode characters into byte sequences. For example, U+1F385 '🎅'
is encoded as a sequence of 4 bytes in UTF-8:
julia> codeunits("🎅")
4-element Base.CodeUnits{UInt8, String}:
0xf0
0x9f
0x8e
0x85
whereas U+03B1 'α'
is encoded as two bytes in UTF-8:
julia> codeunits("α")
2-element Base.CodeUnits{UInt8, String}:
0xce
0xb1
I showed how to get the UTF-8 encoding bytes with codeunits
above, but it sounds like you really want the codepoint value, which you can get easily by e.g.:
julia> '🎅' # display information about the character in the REPL
'🎅': Unicode U+1F385 (category So: Symbol, other)
julia> UInt32('🎅') # codepoint as an integer value
0x0001f385
Note that a “visible glyph” might be more than one character, e.g. α̂
(a single “grapheme”) is two characters:
julia> collect("α̂")
2-element Vector{Char}:
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
'̂': Unicode U+0302 (category Mn: Mark, nonspacing)
and you can get information about how to type it easily by pasting it at the help?>
prompt:
help?> α̂
"α̂" can be typed by \alpha<tab>\hat<tab>
You can also type codepoint values as \uXXXX
escape sequences into a string and then copy-paste it:
julia> "\u03B1\u0302"
"α̂"
You can also add custom tab completions to the REPL, e.g.
using REPL: REPLCompletions
REPLCompletions.latex_symbols["\\alphahat"] = "α̂" # or "\u03B1\u0302"
will let you tab-complete \alphahat
to α̂
. (And, of course, all modern operating systems provide a variety of input methods for Unicode characters.)
See this post for how to directly use codepoint values as variable names like uvar"\u03B1\u0302"
in Julia (which in practice will probably be about as popular as trigraphs).
Note that this is not quite true, especially for strings (or “glyphs” or graphemes) that consist of multiple characters. Unicode equivalence generally involves some form of normalization to do comparisons. (And Julia provides facilities for this. For source-code identifiers, Julia does NFC normalization + some custom normalizations.)
It’s 2024 — if your font won’t display characters that you want to use, get a better font. (And if your editor doesn’t support Unicode, stop using ed
and get a better editor.)