Tab completion of \uXXXX in the REPL?

stevengj · January 6, 2024, 6:35pm

It seems like you are using the word “encoding” in a non-standard way — it sounds like you mean the Unicode codepoint value. e.g. U+0041 is the codepoint for the ASCII character 'A', not an “encoding”.

In contrast, UTF-8 is an “encoding” of Unicode characters into byte sequences. For example, U+1F385 '🎅' is encoded as a sequence of 4 bytes in UTF-8:

julia> codeunits("🎅")
4-element Base.CodeUnits{UInt8, String}:
 0xf0
 0x9f
 0x8e
 0x85

whereas U+03B1 'α' is encoded as two bytes in UTF-8:

julia> codeunits("α")
2-element Base.CodeUnits{UInt8, String}:
 0xce
 0xb1

I showed how to get the UTF-8 encoding bytes with codeunits above, but it sounds like you really want the codepoint value, which you can get easily by e.g.:

julia> '🎅' # display information about the character in the REPL
'🎅': Unicode U+1F385 (category So: Symbol, other)

julia> UInt32('🎅') # codepoint as an integer value
0x0001f385

Note that a “visible glyph” might be more than one character, e.g. α̂ (a single “grapheme”) is two characters:

julia> collect("α̂")
2-element Vector{Char}:
 'α': Unicode U+03B1 (category Ll: Letter, lowercase)
 '̂': Unicode U+0302 (category Mn: Mark, nonspacing)

and you can get information about how to type it easily by pasting it at the help?> prompt:

help?> α̂
"α̂" can be typed by \alpha<tab>\hat<tab>

You can also type codepoint values as \uXXXX escape sequences into a string and then copy-paste it:

julia> "\u03B1\u0302"
"α̂"

You can also add custom tab completions to the REPL, e.g.

using REPL: REPLCompletions
REPLCompletions.latex_symbols["\\alphahat"] = "α̂" # or "\u03B1\u0302"

will let you tab-complete \alphahat to α̂. (And, of course, all modern operating systems provide a variety of input methods for Unicode characters.)

See this post for how to directly use codepoint values as variable names like uvar"\u03B1\u0302" in Julia (which in practice will probably be about as popular as trigraphs).

Note that this is not quite true, especially for strings (or “glyphs” or graphemes) that consist of multiple characters. Unicode equivalence generally involves some form of normalization to do comparisons. (And Julia provides facilities for this. For source-code identifiers, Julia does NFC normalization + some custom normalizations.)

It’s 2024 — if your font won’t display characters that you want to use, get a better font. (And if your editor doesn’t support Unicode, stop using ed and get a better editor.)

Topic		Replies	Views
UnicodeREPL.jl - Type any Unicode character in the REPL Package Announcements repl , unicode , codepoint	6	926	July 4, 2024
Tab completion of more than one Unicode character at a time General Usage repl , vscode , unicode	10	1421	December 25, 2021
How to get "the tab sequence" of a unicode as we enter in the REPL in Julia code? New to Julia	19	1406	August 18, 2023
Tab completion for numeric codepoints General Usage question	6	500	March 2, 2020
Tab completion and Unicode names for values not in Julia Documentation General Usage unicode	1	245	June 10, 2023

Tab completion of \uXXXX in the REPL?

Related topics