Syntax: Escape hatch for unicode haters

PetrKryslUCSD · January 8, 2024, 11:42pm

Sorry you feel that way. But I think my raining on the parade may provide a useful viewpoint to those newbies who would otherwise be exposed only to uncritical cheerleaders clamoring for more unicode in the source code. (Note: I have no objections to it in comments and documentation. Knock yourselves out writing all those umlauts and checks and weird dots and wiggly lines. I speak more or less seven languages and I can see the use case for those little quirks. But in source: the fewer possibilities for confusion, the better. If a programmer spends just five minutes a day being confused by a symbol in code, the costs will add up quickly.)

DNF · January 8, 2024, 11:52pm

For my part, I find that a few well-placed Greek letters can significantly clean up a piece of code and make it both easier to read and understand.

If someone is confused by a θ in my code, I sincerely wonder which parts of it they do understand.

PetrKryslUCSD · January 9, 2024, 12:02am

I think you know well that that is not what I am concerned about.

These are all “rhos”, yet they are all different (distinct) characters, which could potentially have all different meaning.

Does it still seem like a good idea?

jkopper · January 9, 2024, 12:30am

I think it’s disingenuous and underhanded to pretend that this is remotely related to what I’m talking about.

I am not arguing against unicode per se. I am describing how its use in Julia is a significant pain point for me and people who program like me.

Dan · January 9, 2024, 1:07am

A solution could be essentially, a “beautifier” option. The beautifier/formatter would replace unicode with equivalents (such as lvar"rho" suggestion above), but perhaps integrated into main julialang. Additionally, the REPL should have a mode which automatically replaces unicodes with such equivalents also. In this way, pasting into REPL with this REPL option activated will result in non-unicode characters.

Perhaps a more elaborate solution is possible.

jar1 · January 9, 2024, 1:12am

JAX vs Julia (vs PyTorch) · Patrick Kidger says

Many Julia APIs look like Optimiser(η=...) rather than Optimiser(learning_rate=...). This is a pretty unreadable convention.

goerz · January 9, 2024, 1:20am

This thread is getting dangerously close to name-calling. It’s hardly productive to have a shouting match between unicode haters and unicode enthusiasts. It’s not like anybody is going to convince the other.

Julia’s unicode support is a fact of life. If you don’t like unicode, don’t use it in your code base. As soon as you want to contribute to other people’s code, you’ll have to contend with their use of unicode. If I were to submit patches to @PetrKryslUCSD’s code, I’d make sure they’re in ASCII. Conversely, I wouldn’t accept contributions that don’t match the extensive use of unicode my projects.

In practice, there seems to be a pretty strong consensus in the community:

Don’t force unicode for public APIs. So no unicode in types / function names, and no unicode keyword arguments without ASCII aliases
Limit unicode to where it relates to existing mathematical notation. Non-scientific code doesn’t need unicode identifiers.

Seems quite sensible to me, and at least in my opinion, the judicious use of unicode greatly enhances the readability of scientific code. But everyone is going to follow their own philosophy, and things will land where they’ll land.

goerz · January 9, 2024, 1:33am

I’m pretty sure that Patrick Kidger is just plain wrong in that assertion. I’m not a Flux user, but as far as I can tell, Flux (or any other common library) does not have an API that includes Optimiser(η=...). Their documentation isn’t great: At first glance, they make it look like you can call their functions like that. But in fact, these are positional parameters, so you call them as Optimizer(learning_rate) or Optimizer(η), or whatever you want. The field names and required keyword arguments all seem to be ASCII.

mbauman · January 9, 2024, 1:40am

That’s neither Base Julia nor a standard library. One of the most prominent package style guides, SciMLStyle, says this:

Unicode is fine within code where it increases legibility, but in no case should Unicode be used in public APIs. This is to allow support for terminals which cannot use Unicode: if a keyword argument must be η, then it can be exclusionary to uses on clusters which do not support Unicode inputs.

stevengj · January 12, 2024, 3:22pm

9 posts were split to a new topic: Warning against Unicode confusables

Tamas_Papp · January 9, 2024, 6:37am

With tramp in Emacs, I can edit any file on a server I have SSH access to using the editor on my local machine.

I don’t think that non-ASCII chars should be used excessively in generic APIs, but I am relatively unsympathetic to claims of how Unicode makes life hard for people when the tooling to deal with it has been around for decades. Eg in this case, tramp has been bundled with Emacs since 21.1, which was released around 2001. (Again, I think VS Code has something similar, but didn’t explore in detail).

But the bottom line is: if Julia is not useful to you, then just don’t use it. No one is forcing you to.

I don’t think this is true, for the following reason: Julia is free software. If there were masses of people who would use it if it wasn’t for Unicode, they could just easily fork it and strip all traces of Unicode from it (and backport all future changes from Base and the compiler, as those hardly use any Unicode).

This is not happening, so maybe there are not many people who are serious about hating Unicode. Now of course they will kvetch about it any time they have an opportunity, but talk is cheap.

mihalybaci · January 9, 2024, 1:53pm

This is a case where Julia has built-in functions to help out. Someone posted this already, but its worth reiterating

help?>α
"α" can be typed by \alpha<tab>

so the “which theta is it mystery?” is readily solveable by copy/pasting into the REPL. I will say though that I always forget this, so a PR to the top of Julia manual’s Unicode Input section about working with unicode characters in source code maybe would be good. It could include looking up characters using help?>, how to use codeunits, and advice on when to use/not use unicode (such as avoiding unicode keywords in function APIs).

Side note: the Julia VS Code extension does give confusion warnings, so for my Planck Law’s example the characters are highlighted and hovering on them tells me that ν can be confused with v, and even gives the code points. So there is that.

In the larger context, I get that people are annoyed by too much unicode. I rather dislike it myself, so I use unicode sparingly and only when it makes the code more readable/understandable (e.g. in well-known physics equations).

But the other argument about not being able to type/display unicode seems to be a red herring. What I see so far is: a hypothetical user is doing a task of some kind and the only text editor they have cannot display a single unicode character. Since no one has posted a real, lived example of this happening, my assumption has to be that it really doesn’t happen in practice. If it did, then someone would surely post their workflow breaking because of it, right?

Nearly all unicode characters I come across are mathematical symobls, so sure, VS Code with the JuliaMono font doesn’t display \:spagetti: properly, but is that used in code in practice? “In practice” is highly relevant because it is really hard to design a solution in search of a problem. Again, I do not dismiss that it can be annoying to deal with unicode since I have experienced that myself. But “annoying” is very different from “it literally prevents me from coding”. And the latter requires examples to test against.

Tamas_Papp · January 9, 2024, 1:59pm

Fear not, Iosevka does

Along with (\:bicyclist:). Which makes sense on so many levels: if you eat a lot of spaghetti you need to exercise.

mbauman · January 9, 2024, 2:38pm

Hey everyone, we’ve had this rodeo before. Unicode isn’t going anywhere, neither in the world at large nor in Julia nor in any other modern language.

The topic at hand here is if adding an ASCII-equivalent syntax to enter unicode identifiers (as is possible in Javascript) would actually help alleviate any difficulties and if it’d be a good idea.

Let’s not just argue with eachother here for the sake of arguing, please.

mikmoore · January 9, 2024, 5:04pm

I only have one example and it’s only partial, but notice that n-ary function composition is only available via ∘ (\circ):

julia> methods(ComposedFunction)
# 1 method for type constructor:
 [1] ComposedFunction(outer, inner)
     @ operators.jl:1038

julia> methods(∘)
# 3 methods for generic function "∘" from Base:
 [1] ∘(f, g)
     @ operators.jl:1053
 [2] ∘(f, g, h...)
     @ operators.jl:1054
 [3] ∘(f)
     @ operators.jl:1052

Maybe I’ll get around to adding n-ary versions of ComposedFunction one of these days. If someone else gets to it before me, even better.

stevengj · January 9, 2024, 5:18pm

julia> sin ∘ cos ∘ tan === ComposedFunction(ComposedFunction(sin, cos), tan)
true

julia> sin ∘ cos ∘ tan === foldl(ComposedFunction, (sin, cos, tan))
true

mikmoore · January 9, 2024, 5:27pm

Of course. But why is ∘ privileged to not require a fold? My point is that there are many Unicode definitions like const ⊻ = xor but ∘ does not follow this pattern.

stevengj · January 9, 2024, 5:41pm

Probably because the need for it never occured to anyone: ComposedFunction was viewed as the lower-level building block and no one saw a need for more high-level constructor methods, since in the real world almost nobody calls it directly. Indeed, if you use JuliaHub to search the thousands of Julia packages for usage of ComposedFunction, people are mostly using it for dispatch. There only seem to be ~~two~~ three instances of anyone calling it directly: one line in InverseFunctions.jl, one in Bijectors.jl, and one in FunctionChains.jl, which call 2-arg and 1-arg versions respectively — in each case, this occurs in methods that are overloaded for ::ComposedFunction arguments, where they maybe wanted to call the low-level constructor explicitly to clarify that the result is the same as the argument type. (This doesn’t exactly speak to a burning desire for ∘ synonyms, either — ∘ is used directly much more often than ComposedFunction.)

~~That being said, in retrospect defining const ∘ = ComposedFunction would have made a lot a sense too (and can probably still be done?)~~. On the other hand, ∘ has the property that the 1-ary method is the identity ∘(f) === f, and the 0-ary method could arguably return identity (though currently this is a MethodError — ~~a bug?~~ by choice), whereas you would want a constructor ComposedFunction(...) to always return a ComposedFunction instance.

What would you return for n == 1 and n == 0? I guess you could just define it for n ≥ 2, but then it is still distinct from ∘.

mikmoore · January 9, 2024, 7:34pm

Indeed, a difference with ComposedFunctions is that it’s a type and therefore should be treated as a constructor. It certainly could return identity for zero arguments and the input for 1 argument rather than a ComposedFunction (this sort of not-construction is uncommon but I don’t think literally unprecedented), but that would be another debate.

More likely, I would probably just replace the definitions of ∘ with something like compose and then set const ∘ = compose like the others. But I haven’t been in a situation where I couldn’t just copy-paste ∘ from a REPL so this has never risen high on my list. Especially since there would be some bikeshedding to resolve with the written name. It’s never caused me problems, but it is something I took note of since I usually avoid Unicode when convenient.

foobar_lv2 · January 12, 2024, 11:36am

This is super cool, thanks for the link!

The only saving grace is that it is very possible to mostly opt-out of this madness by simply not using weird characters: There exist very few serious projects that expose their APIs in a unicode-only way.

The primary remaining pain-points for coding are the missing infix xor, and the fact that Base / stdlib has some gratuitous uses of things like \in<TAB> or \le<TAB> (sucks for copy-paste-adapt cycles if you have a non-unicode policy for your projects). If you’re coding in julia, you will read a lot of Base / stdlib code, more than e.g. java devs will need to real jdk libs, due to documentation verbosity.

The other pain-point is that interaction with non-serious projects like discourse posts or slack is made unnecessarily annoying.

Infix xor. I really want a multi-letter infix operator for that.

I very strongly disagree with your framing that this is a technical problem.

The fundamental issue is a human one: You cannot subvocalize or vocally communicate an unknown/unfamiliar glyph.

It is very difficult for humans to visually distinguish and short-time-remember words composed of unfamiliar glyphs. For this reason, one often transliterates (not translates!) such words, in a way that is somewhat pronouncable (even if the pronounciation is completely wrong).

Imagine having two printed lists, e.g. passenger manifests, and a pen, and having to check off the intersection. Common enough workflow, and a non-computerized fallback is necessary. Imagine one is for loaded baggage and the other is for passengers that have boarded – you need to ensure that all passengers whose baggage has been loaded are actually on the plane.

And now imagine looking at a sea of various names, in their native characters (some Chinese, some Korean names, some have weird African characters you have never seen, some are Hebrew, some Arabic, some Greek, some Cyrillic, some latin). You will be utterly lost trying to figure out which are the same.

The standard solution is to transliterate all these names into some standard character set, which turns out to be latin for historical reasons. Ideally in a way that is superficially pronouncable (irregardless of whether the pronouciation is bogus), because most humans tend to employ their evolved hardware acceleration for audio handling in such tasks (inner voice / subvocalization).

This is exactly where I wanted to go. And the first step is modifying the lexer/tokenizer to accept a latin/ascii transliteration, in a way that preserves ASTs.

Topic		Replies	Views
Non-unicode versions of unicode functions in base/stdlib? Internals & Design	10	1313	May 16, 2021
Warning against Unicode confusables Internals & Design unicode	51	1930	January 13, 2024
Fun with Unicode: TemplateᐸTᐳ syntax and more General Usage syntax , unicode	4	94	August 9, 2024
Rationale behind excluding some unicode characters from identifiers Internals & Design	10	390	March 3, 2023
String conversion from Symbol with Unicode does not yield a string, which is intended to be the same New to Julia question , bug	6	767	December 5, 2020

Syntax: Escape hatch for unicode haters

Related topics