Unicode: a bad idea, in general

Balinus · June 6, 2023, 7:29pm

Would love to see the Python version!

lmiq · June 6, 2023, 7:38pm

I would love Unicode package names:

using 🍌, 🍳, 🍴

And of course we let slip through our fingers the emoticon for the file name extensions.

frylock · June 6, 2023, 7:57pm

Julia could also be the first to adopt an emoji for the default file extension:
MyCoolApp.

Edit: heh, should have read the post above mine

DNF · June 6, 2023, 8:07pm

Too late: Modular Docs - Mojo🔥 FAQ

goerz · June 6, 2023, 8:30pm

Hopefully no derision at all, because I 100% agree with using the capabilities of unicode to the fullest extent possible. In fact, it’s one of the things that attracted me to Julia in the first place. I appreciate that @PetrKryslUCSD will never agree with that, but luckily we don’t work on the same code bases

Your idea of allowing both ascii and unicode keyword arguments is amazing, and I’m very likely to adopt that approach. Thanks!

goerz · June 6, 2023, 8:34pm

For those on a Mac wanting to type scientific unicode a lot: The "U.S. International - Scientific" Keyboard Layout - Michael Goerz

Paul_Soderlind · June 6, 2023, 8:49pm

A careful (conservative) use of unicode can make the code much easier to understand, while still being clear. My rules are: simple unicode that (1) most programmer’s fonts (Consolas, say) can render; (2) not easily mixed up with other similarly looking symbols.

Mason · June 6, 2023, 9:47pm

You shouldn’t hesitate. I think your opinion is the majority opinion here. Unicode is good, but there should be non-unicode API options available, and your method of defining both ASCII and unicode kwargs simultaneously is clever and worth sharing.

PetrKryslUCSD · June 6, 2023, 10:02pm

Well, avoiding Unicode in API (or at least giving an option not to use it) is good. But (as someone pointed out already), code is read more often than it is written. So we should try to make the graphical presentation of the internal code (i.e. not API) as easy on the reader as possible. The ability of readers to affect the graphical presentation of Unicode is typically limited (viz Github), and hence use of problematic symbols should be prohibited, and use of other symbols should be considered from the point of view of visual confusion (viz \rho vs p vs rho etc.).

DNF · June 6, 2023, 10:06pm

I assume that’s not every non-ascii symbol? Do you know which ones are particularly problematic?

Mason · June 6, 2023, 10:14pm

Agreed! So it should be made readable by using whatever notation and conventions are approriate for the problem at hand.

If I am writing code which implements some physics concept, and I expect my code to be read by fellow physicists who have similar training to me, then the most appropriate notation and conventions to clearly communicate ideas are typically going to involve a lot of unicode and single letter variables.

If I’m writing code that I intended to be read and used by a wide variety of people with a wide variety of backgrounds, then the choices as to what maximizes comprehension may look very different (or maybe they just involve better documentation).

PetrKryslUCSD · June 6, 2023, 10:17pm

https://util.unicode.org/UnicodeJsps/confusables.jsp

https://websec.github.io/unicode-security-guide/visual-spoofing/
http://www.unicode.org/Public/security/latest/confusables.txt
How is this for a start?

DNF · June 6, 2023, 10:52pm

For a start? It’s a bit much, actually. Is there some way to extract the essence? I also see ‘ascii confusables’ in there. What sort of characters is it that you would like to prohibit which are allowed today?

Also, isn’t there some sort of normalization of similar unicode symbols happening already?

As for the list of groups affected by reading problems, at least “people with limited time or attention span” and “non-native English speakers”, should benefit from short and visually distinct symbols over long descriptive names.

Maximizing readability for people without reading problems is probably not the worst approach, anyway. And that means using unicode symbols judiciously.

jar1 · June 6, 2023, 10:54pm

Why?

Mason · June 6, 2023, 10:58pm

I think it’s good practice because there are circumstances where someone is using a computing environment that is extremely limited in what sorts of inputs they can write or even display.

I don’t particularly feel strong about it, but I think it’s a good convention to provide optional ASCII interfaces in packages and from the base language.

jar1 · June 6, 2023, 11:10pm

What platforms are these where people are writing Julia code without the ability to type unicode?

DNF · June 6, 2023, 11:10pm

While code with non-ascii symbols can be very nice to read, it is normally less nice to write. Even in environments where they are supported, it can be difficult to find out how to write them, and awkward to do so.

Some code is for reading, but APIs are essentially for writing, so I think there’s a different trade-off.

jar1 · June 6, 2023, 11:11pm

Julia’s repl shows how to do it:

help?> α
"α" can be typed by \alpha<tab>

adienes · June 6, 2023, 11:12pm

help?> the one that looks like a little fish

jar1 · June 6, 2023, 11:14pm

Because they have a photograph of Julia code and don’t want to learn the greek alphabet despite working in a domain where it’s used, and don’t want to use Detexify?

Topic		Replies	Views
Warning against Unicode confusables Internals & Design unicode	51	1939	January 13, 2024
Looking beyond Unicode Internals & Design	8	1773	November 26, 2016
Running out of letters: Pitfalls of Unicode? New to Julia unicode	11	1306	May 14, 2021
Invalid unicode variable General Usage	3	1019	March 3, 2018
General subscripts in symbols? General Usage unicode	4	9291	December 26, 2021

Unicode: a bad idea, in general

Related topics