Unicode: a bad idea, in general

Do you really think everyone knows what is unicode?

I donā€™t really get what point youā€™re trying to make here. Like, is your point that nobody finds unicode to be an API accessibility issue in practice, or what?

1 Like

I was just teasing at the irony of the chicken-egg problemā€”I do appreciate the ability to use unicode in my scripts, although maybe not for public-facing apis

1 Like

Firstly, this is not general knowledge. Secondly, at a given moment you may know you need a Greek letter, but not have one available to copy-paste. And thirdly, it only works if your editor supports latex-to-unicode commands.

And fourthly, itā€™s just more work.

1 Like

Python version:

UTF-8 really makes those equation more easy to follow.

1 Like

My point is that an ascii-only campaign comes at a cost and that having a realistic model of oneā€™s audience is key in making decisions about readability.

If there are lots of people using Julia without the ability to render Unicode, for example, Iā€™d like to know about them and their circumstances; if there arenā€™t, we shouldnā€™t invoke costs to such people in our arguments.

My questions would be

What platform, editor, educational background, eyesight, etc do my users have? How do those affect their ability to use characters fluently? Without knowing these things, appeals to those usersā€™ needs are on shaky ground.

Most Julia users are in VSCode, which edits unicode. Which of the other editors canā€™t?

Finally, how much of this discussion is simply a personal aesthetic preference or even just familiarity? Is this just like 1-based indexing? If we encourage users to do unicode, will they not come to like it more? If we use it more, wonā€™t it become less of a problem as knowledge of how to use it diffuses?

Thanks for clarifying.

Well, as I said, I donā€™t have strong feelings about it one way or another. I certainly prefer unicode, but there also are enough people out there in the ecosystem who have repeatedly voiced their concerns and opposition to unicode-only APIs that I think itā€™s not such a big deal to accommodate it. Especially if we come up with nice patterns to make it easy to do so, like the one @moble showed.

I can definitely say that thereā€™s been times where Iā€™ve SSHā€™d into a running server from my phone where unicode input is more annoying than usual, and every time Iā€™ve done that Iā€™ve been midly thankful for ASCII APIs.

2 Likes

I canā€™t speak to dyslexia or low vision. But I donā€™t see how using rho instead of Ļ (or my example of Lambda_1 instead of Ī›ā‚) will make any difference at all to people with lower reading levels or abilities or learning difficulties. As for non-native English speakers, I work with them all the time, and they are at least as capable of discerning Unicode as I am ā€” usually more so. As a person with limited time and attention span, I personally find Unicode that looks like the math symbols I am already familiar with much faster and less fatiguing to process than ASCII.

As a sighted user, I find ASCII translations of Unicode much ā€œnoisierā€ than Unicode.


Maybe weā€™re talking past each other. When I think about Unicode in Julia, Iā€™m almost exclusively thinking about representing mathematical symbols that are already used in the literature that inspired us to write the code in the first place. Understanding the code largely depends on understanding those concepts from the literature first, which usually involves an ability to understand the symbols.

Is this also the sort of usage that you envision and find objectionable? Is there some broader class of uses that people find objectionable?

1 Like

There are some examples of symbols in this thread that I find much less clear than the ASCII transcription of them. To a degree that may be a function of the screen resolution that is available for the display of the character, of course. But, when sticking to ASCII, one may be sure that ā€œwhat you see is what you getā€. Not so with many Unicode characters.

Since you asked for an example: a' vs aā€™ vs aā€², these are not the same thing. Only one of them is a transpose.

1 Like

Thatā€™s a fair point. I am sometimes surprised by what shows up on github. And even using the excellent JuliaMono, some character combinations appear incorrectly in VS Code, Emacs, and Terminal (which seems to be the fault of those programs, rather than JuliaMono).

Iā€™ll also admit to a little concern when using things like aā€². But context is important, and I only use \prime when itā€™s already used in the literature. :person_shrugging:

1 Like

Hereā€™s a little anecdata: my project is 3,597 lines of code (in src, plus another 1,730 in docs+test), of which ~30 particularly easy-to-write lines are devoted to this API translation. It doesnā€™t feel like much cost to me.

I suspect that it is entirely aesthetic for a lot of users, but I still want them to use my code.

Note that a Unicode API wonā€™t just ā€œencourageā€, it will require users to use Unicode. That might be enough to discourage the particularly lazy Julia user, but also make it literally impossible to use even slightly fancy Unicode from python.

1 Like
  • Unicode is fine within code where it increases legibility, but in no case should Unicode be used in public APIs. This is to allow support for terminals which cannot use Unicode: if a keyword argument must be Ī·, then it can be exclusionary to uses on clusters which do not support Unicode inputs.

Thatā€™s the reasoning for disallowing it in SciMLStyle.

5 Likes

Do you happen to have a reference to these clusters? I havenā€™t seen that restriction myself.

Lots of older clusters like the XSEDE ones had this restriction. It can greatly dependent on the terminal that is used too, where newer terminal versions support it on ā€œstandardā€ hardware, but as you get to other hardware or more legacy systems you tend to have more issues with unicode.

I like using Unicode in my code. What I hate is that many superscripts/subscripts are incomplete. Like subscript f.

1 Like

It seems like there is a setting to support UTF-8 input on the specific terminal mentioned in another thread.

Okay sure, but I cannot just assume that everyone that will ever use Julia has read that post. So therefore, it will not go into public APIs because itā€™s not inclusive. No matter what some hardcore person says, unicode in keyword arguments is an easy way to get 30 confused emails a month back in 2018, before we even had big adoption. Thatā€™s why itā€™s disallowed, and I am sure that with the reach we have now it would be hundreds of people having issues using the software because of one small difference in a naming choice. Itā€™s just not worth it for software that has a wide reach.

8 Likes

How do you feel about the earlier post showing how to allow both character sets for keyword arguments, by setting the unicode argument equal to the non-unicode one by default, and only using the unicode one internally? Should that be disallowed in the style guide?

In general, reducing API surface just makes things easier to maintain. I would just prefer a single keyword argument for that reason. We already have way too many kwargs, I donā€™t want more :sweat_smile:

6 Likes

Personally, Iā€™ve found much less use for unicode in APIs than for internal temporary variables, or function input arguments. So it is less a matter of self-restraint or consideration for people on exotic terminals, than it is of personal preference.

Non-ascii unicode symbols are really useful for variables, especially when they are part of complex mathematical expressions. But an API consists of function names, type names and keyword arguments, and I donā€™t really see how unicode symbols are that appropriate for those.

While a variable is just a placeholder for a value (and can often be abstract or defy concrete naming), API functions, on the other hand, almost always describe a limited number of well thought-out, concrete actions with obvious names, even in cases where there exist popular symbolic representations of them in the scientific literature.

Keyword arguments are something that you would use carefully, and for cases where you want to be extra clear about intent. Keywords are, well, words, in my mind. If your keyword is a unicode symbol, perhaps it should actually be a positional argument instead?

As for types, Iā€™ve just never thought about giving them unicode names. It is conceivable that it could be useful, but is this common?

4 Likes