Unicode: a bad idea, in general

Would love to see the Python version! :joy:

1 Like

I would love Unicode package names:

using šŸŒ, šŸ³, šŸ“

And of course we let slip through our fingers the emoticon for the file name extensions.

7 Likes

Julia could also be the first to adopt an emoji for the default file extension:
MyCoolApp.:cloud_with_lightning:

Edit: heh, should have read the post above mine

1 Like

Too late: Modular Docs - MojošŸ”„ FAQ

8 Likes

Hopefully no derision at all, because I 100% agree with using the capabilities of unicode to the fullest extent possible. In fact, itā€™s one of the things that attracted me to Julia in the first place. I appreciate that @PetrKryslUCSD will never agree with that, but luckily we donā€™t work on the same code bases :wink:

Your idea of allowing both ascii and unicode keyword arguments is amazing, and Iā€™m very likely to adopt that approach. Thanks!

3 Likes

For those on a Mac wanting to type scientific unicode a lot: The "U.S. International - Scientific" Keyboard Layout - Michael Goerz

2 Likes

A careful (conservative) use of unicode can make the code much easier to understand, while still being clear. My rules are: simple unicode that (1) most programmerā€™s fonts (Consolas, say) can render; (2) not easily mixed up with other similarly looking symbols.

2 Likes

You shouldnā€™t hesitate. I think your opinion is the majority opinion here. Unicode is good, but there should be non-unicode API options available, and your method of defining both ASCII and unicode kwargs simultaneously is clever and worth sharing.

4 Likes

Well, avoiding Unicode in API (or at least giving an option not to use it) is good. But (as someone pointed out already), code is read more often than it is written. So we should try to make the graphical presentation of the internal code (i.e. not API) as easy on the reader as possible. The ability of readers to affect the graphical presentation of Unicode is typically limited (viz Github), and hence use of problematic symbols should be prohibited, and use of other symbols should be considered from the point of view of visual confusion (viz \rho vs p vs rho etc.).

1 Like

I assume thatā€™s not every non-ascii symbol? Do you know which ones are particularly problematic?

Agreed! So it should be made readable by using whatever notation and conventions are approriate for the problem at hand.

If I am writing code which implements some physics concept, and I expect my code to be read by fellow physicists who have similar training to me, then the most appropriate notation and conventions to clearly communicate ideas are typically going to involve a lot of unicode and single letter variables.

If Iā€™m writing code that I intended to be read and used by a wide variety of people with a wide variety of backgrounds, then the choices as to what maximizes comprehension may look very different (or maybe they just involve better documentation).

2 Likes

https://util.unicode.org/UnicodeJsps/confusables.jsp

https://websec.github.io/unicode-security-guide/visual-spoofing/
http://www.unicode.org/Public/security/latest/confusables.txt
How is this for a start?

For a start? Itā€™s a bit much, actually. Is there some way to extract the essence? I also see ā€˜ascii confusablesā€™ in there. What sort of characters is it that you would like to prohibit which are allowed today?

Also, isnā€™t there some sort of normalization of similar unicode symbols happening already?

As for the list of groups affected by reading problems, at least ā€œpeople with limited time or attention spanā€ and ā€œnon-native English speakersā€, should benefit from short and visually distinct symbols over long descriptive names.

Maximizing readability for people without reading problems is probably not the worst approach, anyway. And that means using unicode symbols judiciously.

Why?

I think itā€™s good practice because there are circumstances where someone is using a computing environment that is extremely limited in what sorts of inputs they can write or even display.

I donā€™t particularly feel strong about it, but I think itā€™s a good convention to provide optional ASCII interfaces in packages and from the base language.

2 Likes

What platforms are these where people are writing Julia code without the ability to type unicode?

While code with non-ascii symbols can be very nice to read, it is normally less nice to write. Even in environments where they are supported, it can be difficult to find out how to write them, and awkward to do so.

Some code is for reading, but APIs are essentially for writing, so I think thereā€™s a different trade-off.

2 Likes

Juliaā€™s repl shows how to do it:

help?> Ī±
"Ī±" can be typed by \alpha<tab>
help?> the one that looks like a little fish
4 Likes

Because they have a photograph of Julia code and donā€™t want to learn the greek alphabet despite working in a domain where itā€™s used, and donā€™t want to use Detexify?

2 Likes