Tab completion of \uXXXX in the REPL?

cormullion · January 7, 2024, 9:31am

Whenever I need to find a Unicode glyph, I use Glyphy.jl:

and then I can copy/paste from here.

Of course, the glyphs can only be displayed if you have the fonts/fallback fonts around that include them. So you probably won’t see any glyphs for the Zanabazarin Dörböljin Useg script, for example:

julia-1.10> glyphy("zanabazar")

11a00   𑨀         zanabazar square letter a
11a01   𑨁         zanabazar square vowel sign i
11a02   𑨂         zanabazar square vowel sign ue
11a03   𑨃         zanabazar square vowel sign u
...

StefanKarpinski · January 7, 2024, 2:28pm

I would also like a concrete example where this makes sense. One of the best aspects of LaTeX completions is that it’s now supported in a fairly consistent way across many editong environments, from the Julia REPL to Jupyter notebooks to VS Code, Emacs, vim, SublimeText and so on. If the Julia REPL added a massive set of new completions (all code points), then it would be out of balance with all the other environments. Of course, it’s hard to picture when you’d use it, so perhaps it doesn’t matter, but then again if that’s the case then why add the feature in the first place?

devel-chm · January 7, 2024, 7:58pm

@foobar_lv2 This topic was split off from your recent discourse thread but it started as a suggestion addressing some of the issues in your posts in that discussion.

My proposal is adding better, more uniform support for Unicode entry by extending
the REPL tab completion from the current ad hoc LaTeX-ish expansions by allowing
expansion of any Unicode codepoint in the REPL.

I see this as more a polishing of the existing REPL expansion capability to allow entry
of any Unicode character and not just the set of characters that are blessed by historic
or common usage, such as \alpha or \mu. This would not only allow any Unicode
character to be entered, but could also enable search/recognition of existing aliases.
If there were no existing alias, it could offer an immediate ability to add an entry to
complete that character.

While there was a generally dismissive attitude towards this capability, the
discussions clarified what is available in the existing Julia support that could be
leveraged to smooth some of the rough edges:

String macros could be implemented (here are examples for Unicode only support
and for LaTeX only support). N.B. My proposal would essentially combine these 2
into the existing REPL TAB completion capability.

Syntax: Escape hatch for unicode haters

You can do something now with a string macro, no parser changes needed:

macro uvar_str(s::AbstractString)
   sym = Meta.parse(unescape_string(s))
   sym isa Symbol || throw(ArgumentError("expecting a single symbol"))
   return esc(sym)
end

Which lets you do e.g.:

julia> uvar"\ub5" = 3
3

julia> µ
3

julia> uvar"\u2208"('b', "foobar") # 'b' ∈ "foobar"
true

and

Syntax: Escape hatch for unicode haters

Based on the previous macro, this version might have more compassion towards source-code reader:

using REPL
macro lvar_str(s::AbstractString)
    if !haskey(REPL.REPLCompletions.latex_symbols, "\\"*s)
        throw(ArgumentError("expecting relevant latex"))
    end
    new_s = REPL.REPLCompletions.latex_symbols["\\"*s]
    sym = Meta.parse(unescape_string(new_s))
    sym isa Symbol || throw(ArgumentError("expecting a single symbol"))
    return esc(sym)
end

julia> lvar"mu"
3

julia> lvar"pi"
π = 3.1415926535897...

Entering the glyph as a character will display the unicode codepoint value
stevengj:
I showed how to get the UTF-8 encoding bytes with codeunits above, but it sounds like you really want the codepoint value, which you can get easily by e.g.:
```
julia> '🎅'  # display information about the character in the REPL
'🎅' : Unicode U+1F385 (category So: Symbol, other)

julia> UInt32('🎅' ) # codepoint as an integer value
0x0001f385
```
Note that a “visible glyph” might be more than one character, e.g. 'α̂ '(a single “grapheme”) is two characters:
```
julia> collect("α̂")
2-element Vector{Char}:
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
'̂': Unicode U+0302 (category Mn: Mark, nonspacing)
```
and you can get information about how to type it easily by pasting it at the help?> prompt:
```
help?> α̂
"α̂" can be typed by \alpha<tab>\hat<tab>
```
You can use string escape sequences to return a string with the desired unicode in it
stevengj:
You can also type codepoint values as \uXXXX escape sequences into a string and then copy-paste it:
```
julia> "\u03B1\u0302"
"α̂"
```

For additional information on Unicode support currently in Julia, see the documentation for
the Unicode standard library.

I hope this collected information is helpful for anyone having some of these Unicode pain points.
My thanks to everyone who participated in this discussion.

Also, I meant to include use Glyphy in the suggestion by @cormullion, thanks!

foobar_lv2 · January 12, 2024, 10:39am

For what its worth, this would not address (let even ameliorate) my issue.

My issue is that many editing contexts I use in everyday life are not julia aware, and therefore cannot support julia-specific tab-completions.

I think the latex-style ad-hoc tab-completions from the REPL and most julia-aware editors are pretty nice! But this does not solve the issue that e.g. web-browser text input fields (like here on discourse), various git mergetools, and editing contexts on less-than-ideally configured machines make input of “foreign” chars so hard.

Technically, it should be the job of my OS to make keyboard input of foreign characters convenient, and it should be the job of these tools to use syscalls / libraries that allow my OS to help me make this convenient. Practically speaking, mankind failed at OS design in this regard (possible: Yes, accessibility tools for variously disabled people are a thing. Convenient: Nope!).

Julia acknowledges this failure by the very existence of REPL tab completions, but somehow fails to acknowledge that you sometimes want to edit code from outside REPL / IDE.

…and then there is the even harder collection of settings, like less / @less (text input field: search via /), shell contexts (eg xterm + bash) for use of things like git commit -m or grep. sed is also still a thing, but I have not yet run into julia-specific pain-points with that. This is even harder because the spelling of unicode characters is not normalized: e.g. different codepoints for e.g. \mu<TAB> that the parser turns into the same Symbol (but the Symbol constructor doesn’t!).

Topic		Replies	Views
UnicodeREPL.jl - Type any Unicode character in the REPL Package Announcements repl , unicode , codepoint	6	937	July 4, 2024
Tab completion of more than one Unicode character at a time General Usage repl , vscode , unicode	10	1440	December 25, 2021
How to get "the tab sequence" of a unicode as we enter in the REPL in Julia code? New to Julia	19	1423	August 18, 2023
Tab completion for numeric codepoints General Usage question	6	517	March 2, 2020
Tab completion and Unicode names for values not in Julia Documentation General Usage unicode	1	255	June 10, 2023

Tab completion of \uXXXX in the REPL?

Related topics