Syntax: Escape hatch for unicode haters

foobar_lv2 · January 5, 2024, 3:08pm

No, vscode, repl and jupyter all support it.

I do write tiny snippets in the browser / github / slack, and grep and meld (for git mergetool). These are the points where my software stack is lacking because I did not adopt the emacs operating system.

I heard from @Tamas_Papp about how he deals with unicode in git merge conflicts (emacs ftw). Do your git mergetools support \alpha<TAB> @sijo @mbauman ?

I can figure out a tab-completion. But I will forget it 5 minutes later unless it stares me into the face, as latex source files do, or unless they stared me into the face for many years, as many latex sequences did.

In sum, using unicode in julia is just not worth it for me, I can use in, xor, union instead of \in, $\xor$ (you see me struggling putting that char into my browser input box right now!) or \cup, at the price of some more parentheses.

Most APIs are light on unicode, so can be used without too much annoyance.

Rare real PRs to projects that use unicode are also not an issue, if somebody is asked to review it then I can take the pain to format it. I will suffer when rebasing / solving merge conflicts.

Quick edits or quick @eval to try something out cen be very annoying though. That would become simpler if the parser could do unicode escaping.

Same with interacting with code on discourse. If it contains unicode, I will either take the time to completely refactor the code, or I will not interact at all.

mbauman · January 5, 2024, 3:09pm

Most modern languages support unicode identifiers these days:

Python (2. Lexical analysis — Python 3.3.7 documentation)
Rust (Identifiers - The Rust Reference)
Swift (Documentation)
Go (The Go Programming Language Specification - The Go Programming Language)
C# (C# identifier names - rules and conventions - C# | Microsoft Learn)
Raku (identifiers | Raku Documentation)
Java (Charsets and Unicode Identifiers in Java - DZone)
Ruby (Coding Ninjas Studio)
C++ (Identifiers - cppreference.com)
Heck, even C99 has rudimentary support… but as the oldest one here, it ironically allows int \U03B1 = 2; while leaving α = 2; implementation defined. (Identifier - cppreference.com).
Javascript also allows unicode as well as using unicode escapes in identifiers somewhat similarly to C (Valid JavaScript variable names in ES5 · Mathias Bynens)

Of the languages I thought of here, only Perl, R, and Fortran don’t seem to support unicode. And only C and Javascript support using \U or \u escapes. None support latex- or html-like entity names.

Tamas_Papp · January 5, 2024, 3:19pm

Incidentally, while the examples in this topic are about LaTeX math, note that Unicode has been a boon to speakers of languages other than English.

I am not sure which other languages you speak/write, but maybe you are aware of the mess with various encodings etc that preceded unicode and specifically utf8.

“Hating” Unicode is a very English-centric view.

sijo · January 5, 2024, 3:25pm

I use vim for that so yes.

You give a good example with the Julia Discourse: I also find it annoying that Unicode aliases are not available there. I copy-paste from a Julia REPL in these cases, not ideal but still worth it for me.

Personally I would be annoyed to encounter Julia files that use e.g. a \\xor b instead of a ⊻ b (I find it ugly and less readable). And there’s no easy fix for Unicode lovers: To see nice looking code I would have to configure all the viewers, which are much more numerous and typically more difficult to configure than editors (not sure how I would do it in my browser for example).

foobar_lv2 · January 5, 2024, 3:31pm

I’m not saying that IDE support for julia or latex is bad.

I’m saying that it is perfectly possible to edit latex sources or 90+% of julia sources with e.g. nano or whatever textbox meld uses.

But it is extremely painful to edit julia sources that go heavy on unicode without IDE-like tooling.

The editing experience without specialized tooling is part of the of “plain” in “plaintext”.

Yep, it doesn’t completely solve it. But it allows for projects to locally enforce auto-formatting in either direction (normalize away all escape sequences, or normalize away all non-ascii identifiers/operators). Then the “marketplace of github” could figure out which way is better for more people. I could start quick “try something out” sessions by git clone ... followed by hypothetical juliafmt --escapeUnicodeSyntax ..

Interesting dichotomy, thanks for pointing that out. More viewers than editors, but the pain from bad editor is larger than the pain from bad viewer.

I think most syntax highlighting is powerful enough to display a \\xor b in the other way that would require touching the mouse to copy-paste from the REPL.

DNF · January 5, 2024, 9:43pm

This just tells me that many editors are too primitive support plain text. I frankly find that unacceptable, and think that sort of editors should be abandoned.

(BTW, my name contains several non-ascii characters, and I’m sick of the lagging support in many systems.)

foobar_lv2 · January 6, 2024, 2:37pm

Sorry, I mis-spoke. Unicode in general, and specifically utf8, are awesome! Even though I very rarely need to leave latin-1 (German, English, occasional French), it used to be a mess.

And the unicode support in the julia runtime with String is really good, and I like the index-by-address that so many people are surprised by.

The problem is UI/OS design around input of foreign characters. “foreign” means “foreign to the user’s keyboard layout”, so my native German is foreign to me, in this context – good ergonomics for special characters is more important for me than good ergonomics for typing my native language.

OS / programming languages are already super US centric in this often overlooked way, starting with simple things like slash / as path separator in unix-like systems. It’s not like Germans were stupidly obstinate unix-haters when placing their slash, keyboard layouts evolved from typewriters that predate digital computers.

Latex has a very good solution: you can use utf8 non-ascii chars in source files, but you can also use the latex escape sequences.

Editors like emacs julia-mode can add sugar like displaying \alpha as \alpha, or emitting the utf8 character upon \alpha<TAB>. But editor support is not mandatory for editing latex sources, which is important for users who live in a more fragmented world of input-handlers, i.e. who don’t run the emacs operating system.

Afaiu no other language than latex has a good solution, besides “don’t use foreign characters in sections that you need to edit”. (html is very borderline)

That works really well if foreign characters are confined to string literals or source-code comments or niche projects that you don’t deal with.

Alas, the problem in the julia ecosystem is that use of foreign characters in relevant parts of source files is actually medium-wide spread.

And I see no significant appreciation in the community for the annoyances that this causes, and for the need of comprehensive tooling around this.

This is not really about programming languages, it is about the ergonomics of the plain-text file format that are julia source files (.jl). Ergonomics are determined not just by the spec, but also available tooling, typical editing contexts (source control needed? 3-way merges common? is greppability important?) and typical real-world usage patterns (how typical is it to have to type/edit “foreign” characters?).

mbauman · January 6, 2024, 3:35pm

A post was split to a new topic: Better explicit support for Unicode encodings?

mbauman · January 6, 2024, 7:21pm

The analogy with TeX only really makes sense to me if there existed \commands with unicode in them and there was an alternate way to call those. Your post isn’t about ways of outputting unicode with ASCII stand-ins (which as you note Julia’s string syntax can do), but rather about ways to refer to unicode identifiers using a secondary ASCII form.

As I noted above, nearly every modern programming language supports unicode identifiers without an ASCII “escape hatch”. Sure, perhaps Julia is an outlier in how much it’s used, but you’ve encapsulated my thoughts — and where I think the solution belongs — quite well:

There are lots of ways to customize plain text editors and even OS-level keyboard entry these days.

Tamas_Papp · January 8, 2024, 1:36pm

Perhaps that is because they are aware of tooling that exists, and are using it. Maybe you could invest in exploring that too.

Most OSs these days do not require that you pick a single keyboard layout from a predefined list: you can extend, mix, switch, etc. I only know about Linux but I would be surprised if OS X didn’t have anything like this (and would be surprised if the Windows solution wasn’t convoluted and clunky ).

Insider a particular IDE that is actively maintained, there are usually zillions of solutions. Eg if you are not in a Julia source file in VS code, there is the generic

and similar extensions. Firefox has

and I am sure we could continue this list for various apps. Even if an app does not have that, you can quickly edit up some UTF8 text in your favorite editor and copy paste.

IMO Unicode entry is best handled by editors, and tweaking Julia (the parser) to support an alternate entry method that is convertd to UTF8 on the fly would be the wrong place to address this.

goerz · January 8, 2024, 2:41pm

Here’s what I’m using: The "U.S. International - Scientific" Keyboard Layout - Michael Goerz

John_Gibson · January 8, 2024, 2:45pm

Impressive!

mbauman · January 8, 2024, 3:34pm

Oh my goodness, that’s awesome. I’ve long used a custom DefaultKeybindings.dict file to achieve this (as well as adding additional emacs-like cursor/editing actions), but that only works in some applications.

It looks like your link to Ukelele is broken on your blog — it’s here: Ukelele - SIL Language Technology - SIL Language Technology

jkopper · January 8, 2024, 9:15pm

Counterpoint: Perhaps many people are sufficiently annoyed that they’re opting out of using Julia altogether.

I’m speaking here as someone with no horse in this race. I use Julia as a hobbyist, but the widespread usage of exotic unicode characters is by far the most aggravating part of the language. It truly would be enough to keep me from using Julia if I didn’t have independent interest in some Julia projects. It’s certainly enough for me to keep from bothering to use it professionally.

There are two reasons:

I worry about being able to enter code.
I worry about others being able to read my code.

As soon as you require users to have special tooling just to type their code, then you have lost a significant portion of developers. Tooling is for improving the coding process, not enabling it. If I can’t ssh into a terminal and edit code with whatever editor happens to be installed, then your language isn’t useful to me. This is a pretty common view, at least outside of the Julia community.

I understand the counter-counterpoint, that the language is targeting a specific set of users (the scientific computing community, basically) and if that community loves unicode function names, then maybe it’s best to design the language and tooling to facilitate that. But I can’t believe “Unicode haters” are truly that rare.

DNF · January 8, 2024, 9:49pm

Excuse me for asking, but where do you come across all these exotic characters? I have only ever seen them used very sparingly, and never have I been forced to use it. What APIs are forcing this on the user?

I only use unicode where I feel it improves my code, but I’m not forced to use it anywhere.

The first point is relevant if you are required to use unicode by some API, but otherwise not.

The second point, I don’t understand, actually. Why would you worry about that if you only use ascii characters?

stevengj · January 8, 2024, 9:54pm

While many people enjoy using Unicode symbols in their own code (which has happened because it’s become so easy to type with common tooling!), and you’ll see it in a lot of examples and internal implementations, it’s much less common for it to be required to access any API. I don’t recall any feature of the base language or standard libraries that requires non-ASCII symbols, even if there is a Unicode shortcut.

Is there a particular Unicode-only API you’ve been aggravated by?

(The main external package I know whose API requires a lot of Unicode symbols is the Gridap.jl finite-element library, which is a great package but in a specialized mathematical domain where the symbols make a lot of sense. So, while there are examples of such APIs in the Julia ecosystem, I wouldn’t call it “widespread”. It’s a decision that the developers of each package have to make for themselves.)

jkopper · January 8, 2024, 10:18pm

Julia Base contains \circ, \leq, \in, \xor and many others! My complaint is not that an API may force me to use unicode characters, but that I might have to interact with code that does. While many other languages do support (typically a very limited set of) unicode characters, it’s very rare that people actually use those characters. In Julia, however, they’re everywhere.

Maybe I see code on the internet and I want to copy & paste it into an editor. Good luck to me! I’ll cross my fingers and hope it works.

Maybe I see code that I want to run written as it is in this very thread, with people (like me) actively struggling to type a symbol like ≠ (which I had to google and paste into this text box) so they write \neq instead and now I have to figure out how to get that input correctly into my code editor. It’s a usability nightmare

You’ve completely misunderstood. I may need to engage with unicode characters if I am working on a Julia project with other people. Even if an API doesn’t require me to use unicode, someone else working on the same codebase may opt to do so.

Imagine now that I too sometimes use unicode. If I change my coding setup for any reason, I may have difficulty working on my existing code (1). If a collaborator doesn’t have a similarly functional setup, then they may have difficulty reading my code (2)

DNF · January 8, 2024, 11:20pm

As a non-ascii person, by name and language, I find the clinging to an outdated character set quite off-putting. The consequences of this attitude is a source of annoyance and uncertainty every time I book an international plane ticket. Anything that can work to spread the acceptance and use of unicode is great in my book.

I have little sympathy or patience with the anti-unicode view, it’s a great improvement to code quality at a small price. Tools that cannot handle this should be abandoned. Someone struggling to interact with unicode in general is a red flag to me.

PetrKryslUCSD · January 8, 2024, 11:32pm

I think it is debatable that it is (i) an improvement, and that the (ii) price is small.

DNF · January 8, 2024, 11:37pm

I’m presenting my opinion. It is most certainly a great improvement to code containing many mathematical expressions and symbols.

I have noticed hardly any cost at all.

If you don’t like it you are welcome to avoid using it. But the repeated criticism of those who find it pleasant and useful is getting increasingly annoying.

Topic		Replies	Views
Non-unicode versions of unicode functions in base/stdlib? Internals & Design	10	1306	May 16, 2021
Warning against Unicode confusables Internals & Design unicode	51	1883	January 13, 2024
Fun with Unicode: TemplateᐸTᐳ syntax and more General Usage syntax , unicode	4	89	August 9, 2024
Rationale behind excluding some unicode characters from identifiers Internals & Design	10	378	March 3, 2023
String conversion from Symbol with Unicode does not yield a string, which is intended to be the same New to Julia question , bug	6	758	December 5, 2020

Syntax: Escape hatch for unicode haters

Related topics