UnicodeREPL.jl - Type any Unicode character in the REPL

I would like to announce the new package UnicodeREPL.jl which is a REPL enhancement for tab completing any Unicode codepoint to its Unicode symbol. This is useful if you want Unicode symbols that are not in the standard Julia tab completion and not on your keyboard.

You can read more about it at GitHub - GHTaarn/UnicodeREPL.jl: Tab completion of any Unicode codepoint

9 Likes

Interesting package!

It might be worth adding that you can use U to get the glyphs higher than 0xFFFF:

unicode repl> println("\U1D6C1")
𝛁

It might work well alongside Glyphy.jl too. :slight_smile:

1 Like

Thank you :smiley:

Writing Unicode characters in this way inside a string is actually standard Julia syntax, so it works with or without UnicodeREPL.jl

The advantage of UnicodeREPL.jl is that you can tab complete so that you can more clearly see what you are writing before pressing return. UnicodeREPL.jl also works for Unicode characters that are not inside strings. I did write that one can use codepoints of any length, but I will consider including this example in future documentation to make this more clear.

Wow, Glyphy.jl looks like a very useful companion to UnicodeREPL.jl , it has actually gotten me thinking about making some kind of synergy between the two packages, it is certainly worth considering installing Glyphy if one uses UnicodeREPL.

1 Like

On a related note, I have found some issues with how Julia deals with Unicode characters with code points above FFFF.

For example,
β€˜\U1D6C1’ returns β€˜π›β€™: Unicode U+1D6C1 (category Sm: Symbol, math)
β€œ\U1D6C1” returns β€œπ›β€
β€˜\u1D6C1’ returns ParseError: character literal contains multiple characters
β€œ\u1D6C1” returns β€œα΅¬1” (note that the code point for ᡬ is 1D6C)

I believe this is correct, and it’s documented (the rule comes from other languages):
https://docs.julialang.org/en/v1/base/strings/#Base.unescape_string

  • Unicode BMP code points (\u with 1-4 trailing hex digits)
  • All Unicode code points (\U with 1-8 trailing hex digits; max value = 0010ffff)
  • Hex bytes (\x with 1-2 trailing hex digits)
  • Octal bytes (\ with 1-3 trailing octal digits)

\u is BMP, likely since once that was all there was so \U needed now to not be ambitious in some cases.

You actually hit such a case, why you got an error, but not for β€œ\u1D6C1” because it’s a valid two letter string like β€˜\u1D6C’ + β€˜1’. One reason to use the other syntax if you really after a Char, not a String (it’s also tiny bit faster).

It was a bot hard to look this up in the docs, and where should it be documented? I ended up looking up escape and then found String in a list. Note, for Strings Β· The Julia Language only \u is mentioned not \U, nor what they mean.

1 Like

Thanks Palli. I am not sure I would have figured it out from the documentation. I think we may need more or clearer examples.

2 Likes

I actually misunderstood the C spec when I implemented this. In C \u requires exactly four hex digits after and \U requires six (I think?) or maybe it allows five or six? Anyway, ours allow up to four and up to six, which is kind of redundant. But yes, this is like C but a little more permissive.

2 Likes