UnicodeREPL.jl - Type any Unicode character in the REPL

GHTaarn · March 10, 2024, 9:27pm

I would like to announce the new package UnicodeREPL.jl which is a REPL enhancement for tab completing any Unicode codepoint to its Unicode symbol. This is useful if you want Unicode symbols that are not in the standard Julia tab completion and not on your keyboard.

You can read more about it at GitHub - GHTaarn/UnicodeREPL.jl: Tab completion of any Unicode codepoint

cormullion · March 11, 2024, 11:06am

Interesting package!

It might be worth adding that you can use U to get the glyphs higher than 0xFFFF:

unicode repl> println("\U1D6C1")
𝛁

It might work well alongside Glyphy.jl too.

GHTaarn · March 11, 2024, 10:48pm

Thank you

Writing Unicode characters in this way inside a string is actually standard Julia syntax, so it works with or without UnicodeREPL.jl

The advantage of UnicodeREPL.jl is that you can tab complete so that you can more clearly see what you are writing before pressing return. UnicodeREPL.jl also works for Unicode characters that are not inside strings. I did write that one can use codepoints of any length, but I will consider including this example in future documentation to make this more clear.

Wow, Glyphy.jl looks like a very useful companion to UnicodeREPL.jl , it has actually gotten me thinking about making some kind of synergy between the two packages, it is certainly worth considering installing Glyphy if one uses UnicodeREPL.

daler6 · July 2, 2024, 3:32am

On a related note, I have found some issues with how Julia deals with Unicode characters with code points above FFFF.

For example,
‘\U1D6C1’ returns ‘𝛁’: Unicode U+1D6C1 (category Sm: Symbol, math)
“\U1D6C1” returns “𝛁”
‘\u1D6C1’ returns ParseError: character literal contains multiple characters
“\u1D6C1” returns “ᵬ1” (note that the code point for ᵬ is 1D6C)

Palli · July 2, 2024, 4:08am

I believe this is correct, and it’s documented (the rule comes from other languages):
https://docs.julialang.org/en/v1/base/strings/#Base.unescape_string

Unicode BMP code points (\u with 1-4 trailing hex digits)

All Unicode code points (\U with 1-8 trailing hex digits; max value = 0010ffff)

Hex bytes (\x with 1-2 trailing hex digits)

Octal bytes (\ with 1-3 trailing octal digits)

\u is BMP, likely since once that was all there was so \U needed now to not be ambitious in some cases.

You actually hit such a case, why you got an error, but not for “\u1D6C1” because it’s a valid two letter string like ‘\u1D6C’ + ‘1’. One reason to use the other syntax if you really after a Char, not a String (it’s also tiny bit faster).

It was a bot hard to look this up in the docs, and where should it be documented? I ended up looking up escape and then found String in a list. Note, for Strings · The Julia Language only \u is mentioned not \U, nor what they mean.

daler6 · July 2, 2024, 10:17pm

Thanks Palli. I am not sure I would have figured it out from the documentation. I think we may need more or clearer examples.

StefanKarpinski · July 4, 2024, 9:57pm

I actually misunderstood the C spec when I implemented this. In C \u requires exactly four hex digits after and \U requires six (I think?) or maybe it allows five or six? Anyway, ours allow up to four and up to six, which is kind of redundant. But yes, this is like C but a little more permissive.

Topic		Replies	Views
Tab completion of \uXXXX in the REPL? Internals & Design unicode	23	4386	January 12, 2024
Unicode autocomplete howto VS Code repl	3	411	September 15, 2024
Tab completion of more than one Unicode character at a time General Usage repl , vscode , unicode	10	1441	December 25, 2021
How to custom unicode completions? General Usage question , unicode	3	279	April 22, 2024
How to iterate over unicode characters with multiple codepoints New to Julia	4	2012	October 6, 2020

UnicodeREPL.jl - Type any Unicode character in the REPL

Related topics