Unicode: a bad idea, in general

joa-quim · June 6, 2023, 11:15pm

Do you really think everyone knows what is unicode?

Mason · June 6, 2023, 11:15pm

I don’t really get what point you’re trying to make here. Like, is your point that nobody finds unicode to be an API accessibility issue in practice, or what?

adienes · June 6, 2023, 11:17pm

I was just teasing at the irony of the chicken-egg problem—I do appreciate the ability to use unicode in my scripts, although maybe not for public-facing apis

DNF · June 6, 2023, 11:18pm

Firstly, this is not general knowledge. Secondly, at a given moment you may know you need a Greek letter, but not have one available to copy-paste. And thirdly, it only works if your editor supports latex-to-unicode commands.

And fourthly, it’s just more work.

Ronis_BR · June 6, 2023, 11:40pm

Python version:

UTF-8 really makes those equation more easy to follow.

jar1 · June 6, 2023, 11:42pm

My point is that an ascii-only campaign comes at a cost and that having a realistic model of one’s audience is key in making decisions about readability.

If there are lots of people using Julia without the ability to render Unicode, for example, I’d like to know about them and their circumstances; if there aren’t, we shouldn’t invoke costs to such people in our arguments.

My questions would be

What platform, editor, educational background, eyesight, etc do my users have? How do those affect their ability to use characters fluently? Without knowing these things, appeals to those users’ needs are on shaky ground.

Most Julia users are in VSCode, which edits unicode. Which of the other editors can’t?

Finally, how much of this discussion is simply a personal aesthetic preference or even just familiarity? Is this just like 1-based indexing? If we encourage users to do unicode, will they not come to like it more? If we use it more, won’t it become less of a problem as knowledge of how to use it diffuses?

Mason · June 7, 2023, 12:11am

Thanks for clarifying.

Well, as I said, I don’t have strong feelings about it one way or another. I certainly prefer unicode, but there also are enough people out there in the ecosystem who have repeatedly voiced their concerns and opposition to unicode-only APIs that I think it’s not such a big deal to accommodate it. Especially if we come up with nice patterns to make it easy to do so, like the one @moble showed.

I can definitely say that there’s been times where I’ve SSH’d into a running server from my phone where unicode input is more annoying than usual, and every time I’ve done that I’ve been midly thankful for ASCII APIs.

moble · June 7, 2023, 3:23am

I can’t speak to dyslexia or low vision. But I don’t see how using rho instead of ρ (or my example of Lambda_1 instead of Λ₁) will make any difference at all to people with lower reading levels or abilities or learning difficulties. As for non-native English speakers, I work with them all the time, and they are at least as capable of discerning Unicode as I am — usually more so. As a person with limited time and attention span, I personally find Unicode that looks like the math symbols I am already familiar with much faster and less fatiguing to process than ASCII.

As a sighted user, I find ASCII translations of Unicode much “noisier” than Unicode.

Maybe we’re talking past each other. When I think about Unicode in Julia, I’m almost exclusively thinking about representing mathematical symbols that are already used in the literature that inspired us to write the code in the first place. Understanding the code largely depends on understanding those concepts from the literature first, which usually involves an ability to understand the symbols.

Is this also the sort of usage that you envision and find objectionable? Is there some broader class of uses that people find objectionable?

PetrKryslUCSD · June 7, 2023, 3:45am

There are some examples of symbols in this thread that I find much less clear than the ASCII transcription of them. To a degree that may be a function of the screen resolution that is available for the display of the character, of course. But, when sticking to ASCII, one may be sure that “what you see is what you get”. Not so with many Unicode characters.

Since you asked for an example: a' vs a’ vs a′, these are not the same thing. Only one of them is a transpose.

moble · June 7, 2023, 4:08am

That’s a fair point. I am sometimes surprised by what shows up on github. And even using the excellent JuliaMono, some character combinations appear incorrectly in VS Code, Emacs, and Terminal (which seems to be the fault of those programs, rather than JuliaMono).

I’ll also admit to a little concern when using things like a′. But context is important, and I only use \prime when it’s already used in the literature.

moble · June 7, 2023, 4:11am

Here’s a little anecdata: my project is 3,597 lines of code (in src, plus another 1,730 in docs+test), of which ~30 particularly easy-to-write lines are devoted to this API translation. It doesn’t feel like much cost to me.

I suspect that it is entirely aesthetic for a lot of users, but I still want them to use my code.

Note that a Unicode API won’t just “encourage”, it will require users to use Unicode. That might be enough to discourage the particularly lazy Julia user, but also make it literally impossible to use even slightly fancy Unicode from python.

ChrisRackauckas · June 7, 2023, 4:58am

Unicode is fine within code where it increases legibility, but in no case should Unicode be used in public APIs. This is to allow support for terminals which cannot use Unicode: if a keyword argument must be η, then it can be exclusionary to uses on clusters which do not support Unicode inputs.

That’s the reasoning for disallowing it in SciMLStyle.

jar1 · June 7, 2023, 5:19am

Do you happen to have a reference to these clusters? I haven’t seen that restriction myself.

ChrisRackauckas · June 7, 2023, 5:21am

Lots of older clusters like the XSEDE ones had this restriction. It can greatly dependent on the terminal that is used too, where newer terminal versions support it on “standard” hardware, but as you get to other hardware or more legacy systems you tend to have more issues with unicode.

singularitti · June 7, 2023, 5:31am

I like using Unicode in my code. What I hate is that many superscripts/subscripts are incomplete. Like subscript f.

jar1 · June 7, 2023, 5:35am

It seems like there is a setting to support UTF-8 input on the specific terminal mentioned in another thread.

ChrisRackauckas · June 7, 2023, 5:42am

Okay sure, but I cannot just assume that everyone that will ever use Julia has read that post. So therefore, it will not go into public APIs because it’s not inclusive. No matter what some hardcore person says, unicode in keyword arguments is an easy way to get 30 confused emails a month back in 2018, before we even had big adoption. That’s why it’s disallowed, and I am sure that with the reach we have now it would be hundreds of people having issues using the software because of one small difference in a naming choice. It’s just not worth it for software that has a wide reach.

TheLateKronos · June 7, 2023, 6:06am

How do you feel about the earlier post showing how to allow both character sets for keyword arguments, by setting the unicode argument equal to the non-unicode one by default, and only using the unicode one internally? Should that be disallowed in the style guide?

ChrisRackauckas · June 7, 2023, 6:12am

In general, reducing API surface just makes things easier to maintain. I would just prefer a single keyword argument for that reason. We already have way too many kwargs, I don’t want more

DNF · June 7, 2023, 7:11am

Personally, I’ve found much less use for unicode in APIs than for internal temporary variables, or function input arguments. So it is less a matter of self-restraint or consideration for people on exotic terminals, than it is of personal preference.

Non-ascii unicode symbols are really useful for variables, especially when they are part of complex mathematical expressions. But an API consists of function names, type names and keyword arguments, and I don’t really see how unicode symbols are that appropriate for those.

While a variable is just a placeholder for a value (and can often be abstract or defy concrete naming), API functions, on the other hand, almost always describe a limited number of well thought-out, concrete actions with obvious names, even in cases where there exist popular symbolic representations of them in the scientific literature.

Keyword arguments are something that you would use carefully, and for cases where you want to be extra clear about intent. Keywords are, well, words, in my mind. If your keyword is a unicode symbol, perhaps it should actually be a positional argument instead?

As for types, I’ve just never thought about giving them unicode names. It is conceivable that it could be useful, but is this common?

Topic		Replies	Views
Warning against Unicode confusables Internals & Design unicode	51	1930	January 13, 2024
Looking beyond Unicode Internals & Design	8	1773	November 26, 2016
Running out of letters: Pitfalls of Unicode? New to Julia unicode	11	1306	May 14, 2021
Invalid unicode variable General Usage	3	1019	March 3, 2018
General subscripts in symbols? General Usage unicode	4	9269	December 26, 2021

Unicode: a bad idea, in general

Related topics