Syntax: Escape hatch for unicode haters

TeX — while Turing complete — isn’t really a programming language. It’s a typesetting engine. The analogy with TeX would only be relevant if it applied to tex \commands themselves, which I’ve only ever seen as ASCII.

It does look like C99 supports something like this, which is wild.[1] It’s not what you want here, though: it does so with raw \UXXXX code points, not latex names.

1 Like

Yes, but as you yourself pointed out your julia experience is significantly enhanced by integrating emacs into search/grep, mergetool, and (almost) into yoiur web-browser, making all of them julia-aware.

That’s why I joked that emacs is your operating system – you basically said “text input should be handled by emacs, that can be taught to be julia aware”. Good for you!

This is a nice setup. I admire it, and I mourn that operating system design has not moved towards something like that, i.e. towards unified customizable text input / editing. Instead qt, gtk, firefox, chrome, java-swing, win32, etc all ship their own stuff. This sucks for accessibility, and it sucks for julia.

Most people don’t customize their system/workflow to that level, unless they absolutely must due to physical disabilities. They take whatever text input box they get from their application/context. And then you have a very haphazard combination of text inputs, not all of which are julia-aware and most of which behave slightly differently (xterm+bash vs non-X-tty vs firefox textinput vs chromium textinput vs vi vs emacs vs less vs meld …).

@foobar_lv2 just being curious: do you actually use an editor that doesn’t support \alpha<TAB> when writing Julia code?

Regarding the use of to find the alias \alpha, in practice people use this when they see the symbol on screen and use copy-paste to find the alias.

Now to address your main points: I think the simplest goal (unicode escaping) is already achieved in practice by the operating system? For example in Linux you can type Ctrl-Shift-u 3b1 to get the α character (codepoing U+03B1). This works in the browser, in the terminal (including in less search for example), etc.

The second goal is trickier, but as I understand your proposal doesn’t adress it completely either: even if the language improves support for unicode aliases, this won’t help you find α when you’re using less or other tools…

2 Likes

But (again), other IDE’s support LaTeX entry just fine. I think VS Code does, too, but I have no personal experience with it.

As for accessibility, it would be good to hear from a person who has a disability that requires accessibility support and has problems with Unicode in Julia, and see what the specific problem is and how the community can help (but please, let’s open another discussion for that).

4 Likes

No, vscode, repl and jupyter all support it.

I do write tiny snippets in the browser / github / slack, and grep and meld (for git mergetool). These are the points where my software stack is lacking because I did not adopt the emacs operating system.

I heard from @Tamas_Papp about how he deals with unicode in git merge conflicts (emacs ftw). Do your git mergetools support \alpha<TAB> @sijo @mbauman ?

I can figure out a tab-completion. But I will forget it 5 minutes later unless it stares me into the face, as latex source files do, or unless they stared me into the face for many years, as many latex sequences did.

In sum, using unicode in julia is just not worth it for me, I can use in, xor, union instead of \in, $\xor$ (you see me struggling putting that char into my browser input box right now!) or \cup, at the price of some more parentheses.

Most APIs are light on unicode, so can be used without too much annoyance.

Rare real PRs to projects that use unicode are also not an issue, if somebody is asked to review it then I can take the pain to format it. I will suffer when rebasing / solving merge conflicts.

Quick edits or quick @eval to try something out cen be very annoying though. That would become simpler if the parser could do unicode escaping.

Same with interacting with code on discourse. If it contains unicode, I will either take the time to completely refactor the code, or I will not interact at all.

1 Like

Most modern languages support unicode identifiers these days:

Of the languages I thought of here, only Perl, R, and Fortran don’t seem to support unicode. And only C and Javascript support using \U or \u escapes. None support latex- or html-like entity names.

7 Likes

Incidentally, while the examples in this topic are about LaTeX math, note that Unicode has been a boon to speakers of languages other than English.

I am not sure which other languages you speak/write, but maybe you are aware of the mess with various encodings etc that preceded unicode and specifically utf8.

“Hating” Unicode is a very English-centric view.

9 Likes

I use vim for that so yes.

You give a good example with the Julia Discourse: I also find it annoying that Unicode aliases are not available there. I copy-paste from a Julia REPL in these cases, not ideal but still worth it for me.

Personally I would be annoyed to encounter Julia files that use e.g. a \\xor b instead of a ⊻ b (I find it ugly and less readable). And there’s no easy fix for Unicode lovers: To see nice looking code I would have to configure all the viewers, which are much more numerous and typically more difficult to configure than editors (not sure how I would do it in my browser for example).

1 Like

I’m not saying that IDE support for julia or latex is bad.

I’m saying that it is perfectly possible to edit latex sources or 90+% of julia sources with e.g. nano or whatever textbox meld uses.

But it is extremely painful to edit julia sources that go heavy on unicode without IDE-like tooling.

The editing experience without specialized tooling is part of the of “plain” in “plaintext”.

Yep, it doesn’t completely solve it. But it allows for projects to locally enforce auto-formatting in either direction (normalize away all escape sequences, or normalize away all non-ascii identifiers/operators). Then the “marketplace of github” could figure out which way is better for more people. I could start quick “try something out” sessions by git clone ... followed by hypothetical juliafmt --escapeUnicodeSyntax ..

Interesting dichotomy, thanks for pointing that out. More viewers than editors, but the pain from bad editor is larger than the pain from bad viewer.

I think most syntax highlighting is powerful enough to display a \\xor b in the other way that would require touching the mouse to copy-paste from the REPL.

1 Like

This just tells me that many editors are too primitive support plain text. I frankly find that unacceptable, and think that sort of editors should be abandoned.

(BTW, my name contains several non-ascii characters, and I’m sick of the lagging support in many systems.)

4 Likes

Sorry, I mis-spoke. Unicode in general, and specifically utf8, are awesome! Even though I very rarely need to leave latin-1 (German, English, occasional French), it used to be a mess.

And the unicode support in the julia runtime with String is really good, and I like the index-by-address that so many people are surprised by.

The problem is UI/OS design around input of foreign characters. “foreign” means “foreign to the user’s keyboard layout”, so my native German is foreign to me, in this context – good ergonomics for special characters is more important for me than good ergonomics for typing my native language.

OS / programming languages are already super US centric in this often overlooked way, starting with simple things like slash / as path separator in unix-like systems. It’s not like Germans were stupidly obstinate unix-haters when placing their slash, keyboard layouts evolved from typewriters that predate digital computers.

Latex has a very good solution: you can use utf8 non-ascii chars in source files, but you can also use the latex escape sequences.

Editors like emacs julia-mode can add sugar like displaying \alpha as \alpha, or emitting the utf8 character upon \alpha<TAB>. But editor support is not mandatory for editing latex sources, which is important for users who live in a more fragmented world of input-handlers, i.e. who don’t run the emacs operating system.

Afaiu no other language than latex has a good solution, besides “don’t use foreign characters in sections that you need to edit”. (html is very borderline)

That works really well if foreign characters are confined to string literals or source-code comments or niche projects that you don’t deal with.

Alas, the problem in the julia ecosystem is that use of foreign characters in relevant parts of source files is actually medium-wide spread.

And I see no significant appreciation in the community for the annoyances that this causes, and for the need of comprehensive tooling around this.

This is not really about programming languages, it is about the ergonomics of the plain-text file format that are julia source files (.jl). Ergonomics are determined not just by the spec, but also available tooling, typical editing contexts (source control needed? 3-way merges common? is greppability important?) and typical real-world usage patterns (how typical is it to have to type/edit “foreign” characters?).

1 Like

A post was split to a new topic: Better explicit support for Unicode encodings?

The analogy with TeX only really makes sense to me if there existed \commands with unicode in them and there was an alternate way to call those. Your post isn’t about ways of outputting unicode with ASCII stand-ins (which as you note Julia’s string syntax can do), but rather about ways to refer to unicode identifiers using a secondary ASCII form.

As I noted above, nearly every modern programming language supports unicode identifiers without an ASCII “escape hatch”. Sure, perhaps Julia is an outlier in how much it’s used, but you’ve encapsulated my thoughts — and where I think the solution belongs — quite well:

There are lots of ways to customize plain text editors and even OS-level keyboard entry these days.

1 Like

Perhaps that is because they are aware of tooling that exists, and are using it. Maybe you could invest in exploring that too.

Most OSs these days do not require that you pick a single keyboard layout from a predefined list: you can extend, mix, switch, etc. I only know about Linux but I would be surprised if OS X didn’t have anything like this (and would be surprised if the Windows solution wasn’t convoluted and clunky :wink:).

Insider a particular IDE that is actively maintained, there are usually zillions of solutions. Eg if you are not in a Julia source file in VS code, there is the generic

and similar extensions. Firefox has

and I am sure we could continue this list for various apps. Even if an app does not have that, you can quickly edit up some UTF8 text in your favorite editor and copy paste.

IMO Unicode entry is best handled by editors, and tweaking Julia (the parser) to support an alternate entry method that is convertd to UTF8 on the fly would be the wrong place to address this.

6 Likes

Here’s what I’m using: The "U.S. International - Scientific" Keyboard Layout - Michael Goerz

10 Likes

Impressive!

2 Likes

Oh my goodness, that’s awesome. I’ve long used a custom DefaultKeybindings.dict file to achieve this (as well as adding additional emacs-like cursor/editing actions), but that only works in some applications.

It looks like your link to Ukelele is broken on your blog — it’s here: Ukelele - SIL Language Technology - SIL Language Technology

3 Likes

Counterpoint: Perhaps many people are sufficiently annoyed that they’re opting out of using Julia altogether.

I’m speaking here as someone with no horse in this race. I use Julia as a hobbyist, but the widespread usage of exotic unicode characters is by far the most aggravating part of the language. It truly would be enough to keep me from using Julia if I didn’t have independent interest in some Julia projects. It’s certainly enough for me to keep from bothering to use it professionally.

There are two reasons:

  • I worry about being able to enter code.
  • I worry about others being able to read my code.

As soon as you require users to have special tooling just to type their code, then you have lost a significant portion of developers. Tooling is for improving the coding process, not enabling it. If I can’t ssh into a terminal and edit code with whatever editor happens to be installed, then your language isn’t useful to me. This is a pretty common view, at least outside of the Julia community.

I understand the counter-counterpoint, that the language is targeting a specific set of users (the scientific computing community, basically) and if that community loves unicode function names, then maybe it’s best to design the language and tooling to facilitate that. But I can’t believe “Unicode haters” are truly that rare.

5 Likes

Excuse me for asking, but where do you come across all these exotic characters? I have only ever seen them used very sparingly, and never have I been forced to use it. What APIs are forcing this on the user?

I only use unicode where I feel it improves my code, but I’m not forced to use it anywhere.

The first point is relevant if you are required to use unicode by some API, but otherwise not.

The second point, I don’t understand, actually. Why would you worry about that if you only use ascii characters?

4 Likes

While many people enjoy using Unicode symbols in their own code (which has happened because it’s become so easy to type with common tooling!), and you’ll see it in a lot of examples and internal implementations, it’s much less common for it to be required to access any API. I don’t recall any feature of the base language or standard libraries that requires non-ASCII symbols, even if there is a Unicode shortcut.

Is there a particular Unicode-only API you’ve been aggravated by?

(The main external package I know whose API requires a lot of Unicode symbols is the Gridap.jl finite-element library, which is a great package but in a specialized mathematical domain where the symbols make a lot of sense. So, while there are examples of such APIs in the Julia ecosystem, I wouldn’t call it “widespread”. It’s a decision that the developers of each package have to make for themselves.)

4 Likes