Syntax: Escape hatch for unicode haters

Ininterrompue · January 14, 2024, 4:39am

I can be satisfied using only in if I only ever interacted with my own code, but it can present a problem while reading others’ code, especially if Unicode is frequently used elsewhere in there. The problem is compounded by the fact that there are multiple choices you could make in Julia, in, =, or ∈, and not everyone is immediately aware that these three don’t mean different things in a for loop. Plus, those who shy away from Unicode and use in are necessarily going to be less familiar with code that has Unicode.

I personally agree that it’s not too difficult. But I also wouldn’t brush off other people’s impatience that easily.

It’s far easier to become acquainted with 1-indexing as well, yet there are plenty out there that shy away from any 1-indexed language for illogical reasons. There is some self-selection bias, in that those who continue to use Julia are necessarily fine with 1-indexing. But indexing is a fixed, permanent part of the language – you have to learn 1-indexing if you want to use Julia. By contrast, most Unicode usage comes down to stylistic choice and exists on a spectrum. It’s not inevitable in the sense that 1-indexing is. If they don’t use it but are forced to learn it, it’s because other people are using it in code they have to read.

Given that English is the lingua franca of programming code, and given that Julia is written in English, such code would be restricted to those who understand Greek and would probably be a pain for them to read, based on anecdotes I’ve seen from non-native English speakers. You’d still have to use Latin characters for keywords and defined functions anyway. Nor would any non-Greek speakers want to interact with such code.

Just because you can use Greek or Cyrillic or Chinese characters in the code doesn’t mean you should.

However, a native Greek speaker could find utility in being able to write comments and documentation in Greek for his team who all speak Greek. It is here where one can reasonably afford to write in natural language. Still, I think it is more of a “nice thing to have” than something whose absence would significantly impair productivity or comprehension. The syntax, resources, learning materials, etc. for most programming languages out there are overwhelmingly in English. To be a competent programmer, you’re almost certainly going to have to know some English.

So, it depends on what the alpha is used for, but I doubt this hypothetical Greek programmer would be seriously deprived from having to write alpha instead of α because he would most likely in all ignorance of Julia use a Latin keyboard by default when getting started with the language, as he would if learning any other commonly used language out there.

It is generally accepted that logographic languages are more difficult to learn for Western speakers. Not just Westerners, really – anyone who is not familiar with the character set. There are of course other factors here, but the writing system is certainly playing a role. I know Mandarin, so I can attest to its learning process. Pronunciation-agnosticism is just one of the many difficulties – if you forget how to write a character, you cannot use its pronunciation as a hint to write it. You have to use a dictionary every single time. It’s true that English and French have some unreliable spelling rules, but at least you have an idea of what it looks like. Maybe -ough is tough to remember, but you can still write something on the paper. With Chinese, you don’t even know where to start. You cannot learn Chinese characters without rote memorization no matter if you’re a child or an adult. (Adults will struggle with other things more, like tones.)

Chinese schoolchildren are forced to learn Mandarin in school, so it’s not a surprise that literacy rates are high in China and Taiwan. The literacy rate doesn’t say much about the difficulty of the language, but rather points to other socioeconomic factors.

The point remains, you either know it or you don’t. Everyone knows the symbols on their keyboard, and not everyone knows Unicode. Therefore, if a developer has a choice, it is most accessible for others to read if one sticks with characters readily typeable on a keyboard. That’s the default, “when in doubt” position I’d take when writing new code that isn’t explicitly mathematical. I went back to the code MilesCranmer referenced in the other thread and found the same rendering difficulties on Firefox, Windows 10. But moreover, it became increasingly obvious to me that names consisting of just Latin characters would’ve perfectly sufficed where Unicode was used instead.

This is getting off-topic as is, so I won’t press the issue any further.

Topic		Replies	Views
Unicode: a bad idea, in general General Usage	83	4150	June 17, 2023
Warning against Unicode confusables Internals & Design unicode	51	1982	January 13, 2024
Keeping the syntax and the need to memorise syntax simple Internals & Design	100	7476	September 7, 2022
Tab completion of \uXXXX in the REPL? Internals & Design unicode	23	4424	January 12, 2024
Naming: Remove all underscores to matter what? General Usage	123	7099	January 28, 2018

Syntax: Escape hatch for unicode haters

Related topics