Sorry you feel that way. But I think my raining on the parade may provide a useful viewpoint to those newbies who would otherwise be exposed only to uncritical cheerleaders clamoring for more unicode in the source code. (Note: I have no objections to it in comments and documentation. Knock yourselves out writing all those umlauts and checks and weird dots and wiggly lines. I speak more or less seven languages and I can see the use case for those little quirks. But in source: the fewer possibilities for confusion, the better. If a programmer spends just five minutes a day being confused by a symbol in code, the costs will add up quickly.)
For my part, I find that a few well-placed Greek letters can significantly clean up a piece of code and make it both easier to read and understand.
If someone is confused by a Ξ
in my code, I sincerely wonder which parts of it they do understand.
I think you know well that that is not what I am concerned about.
These are all ârhosâ, yet they are all different (distinct) characters, which could potentially have all different meaning.
Does it still seem like a good idea?
I think itâs disingenuous and underhanded to pretend that this is remotely related to what Iâm talking about.
I am not arguing against unicode per se. I am describing how its use in Julia is a significant pain point for me and people who program like me.
A solution could be essentially, a âbeautifierâ option. The beautifier/formatter would replace unicode with equivalents (such as lvar"rho"
suggestion above), but perhaps integrated into main julialang. Additionally, the REPL should have a mode which automatically replaces unicodes with such equivalents also. In this way, pasting into REPL with this REPL option activated will result in non-unicode characters.
Perhaps a more elaborate solution is possible.
JAX vs Julia (vs PyTorch) · Patrick Kidger says
Many Julia APIs look like
Optimiser(η=...)
rather thanOptimiser(learning_rate=...)
. This is a pretty unreadable convention.
This thread is getting dangerously close to name-calling. Itâs hardly productive to have a shouting match between unicode haters and unicode enthusiasts. Itâs not like anybody is going to convince the other.
Juliaâs unicode support is a fact of life. If you donât like unicode, donât use it in your code base. As soon as you want to contribute to other peopleâs code, youâll have to contend with their use of unicode. If I were to submit patches to @PetrKryslUCSDâs code, Iâd make sure theyâre in ASCII. Conversely, I wouldnât accept contributions that donât match the extensive use of unicode my projects.
In practice, there seems to be a pretty strong consensus in the community:
- Donât force unicode for public APIs. So no unicode in types / function names, and no unicode keyword arguments without ASCII aliases
- Limit unicode to where it relates to existing mathematical notation. Non-scientific code doesnât need unicode identifiers.
Seems quite sensible to me, and at least in my opinion, the judicious use of unicode greatly enhances the readability of scientific code. But everyone is going to follow their own philosophy, and things will land where theyâll land.
Iâm pretty sure that Patrick Kidger is just plain wrong in that assertion. Iâm not a Flux user, but as far as I can tell, Flux (or any other common library) does not have an API that includes Optimiser(η=...)
. Their documentation isnât great: At first glance, they make it look like you can call their functions like that. But in fact, these are positional parameters, so you call them as Optimizer(learning_rate)
or Optimizer(η)
, or whatever you want. The field names and required keyword arguments all seem to be ASCII.
Thatâs neither Base Julia nor a standard library. One of the most prominent package style guides, SciMLStyle, says this:
- Unicode is fine within code where it increases legibility, but in no case should Unicode be used in public APIs. This is to allow support for terminals which cannot use Unicode: if a keyword argument must be η, then it can be exclusionary to uses on clusters which do not support Unicode inputs.
9 posts were split to a new topic: Warning against Unicode confusables
With tramp in Emacs, I can edit any file on a server I have SSH access to using the editor on my local machine.
I donât think that non-ASCII chars should be used excessively in generic APIs, but I am relatively unsympathetic to claims of how Unicode makes life hard for people when the tooling to deal with it has been around for decades. Eg in this case, tramp has been bundled with Emacs since 21.1, which was released around 2001. (Again, I think VS Code has something similar, but didnât explore in detail).
But the bottom line is: if Julia is not useful to you, then just donât use it. No one is forcing you to.
I donât think this is true, for the following reason: Julia is free software. If there were masses of people who would use it if it wasnât for Unicode, they could just easily fork it and strip all traces of Unicode from it (and backport all future changes from Base and the compiler, as those hardly use any Unicode).
This is not happening, so maybe there are not many people who are serious about hating Unicode. Now of course they will kvetch about it any time they have an opportunity, but talk is cheap.
This is a case where Julia has built-in functions to help out. Someone posted this already, but its worth reiterating
help?>α
"α" can be typed by \alpha<tab>
so the âwhich theta is it mystery?â is readily solveable by copy/pasting into the REPL. I will say though that I always forget this, so a PR to the top of Julia manualâs Unicode Input section about working with unicode characters in source code maybe would be good. It could include looking up characters using help?>
, how to use codeunits
, and advice on when to use/not use unicode (such as avoiding unicode keywords in function APIs).
Side note: the Julia VS Code extension does give confusion warnings, so for my Planck Lawâs example the characters are highlighted and hovering on them tells me that Îœ
can be confused with v
, and even gives the code points. So there is that.
In the larger context, I get that people are annoyed by too much unicode. I rather dislike it myself, so I use unicode sparingly and only when it makes the code more readable/understandable (e.g. in well-known physics equations).
But the other argument about not being able to type/display unicode seems to be a red herring. What I see so far is: a hypothetical user is doing a task of some kind and the only text editor they have cannot display a single unicode character. Since no one has posted a real, lived example of this happening, my assumption has to be that it really doesnât happen in practice. If it did, then someone would surely post their workflow breaking because of it, right?
Nearly all unicode characters I come across are mathematical symobls, so sure, VS Code with the JuliaMono font doesnât display \:spagetti:
properly, but is that used in code in practice? âIn practiceâ is highly relevant because it is really hard to design a solution in search of a problem. Again, I do not dismiss that it can be annoying to deal with unicode since I have experienced that myself. But âannoyingâ is very different from âit literally prevents me from codingâ. And the latter requires examples to test against.
Fear not, Iosevka does
Along with (\:bicyclist:
). Which makes sense on so many levels: if you eat a lot of spaghetti you need to exercise.
Hey everyone, weâve had this rodeo before. Unicode isnât going anywhere, neither in the world at large nor in Julia nor in any other modern language.
The topic at hand here is if adding an ASCII-equivalent syntax to enter unicode identifiers (as is possible in Javascript) would actually help alleviate any difficulties and if itâd be a good idea.
Letâs not just argue with eachother here for the sake of arguing, please.
I only have one example and itâs only partial, but notice that n-ary function composition is only available via â
(\circ
):
julia> methods(ComposedFunction)
# 1 method for type constructor:
[1] ComposedFunction(outer, inner)
@ operators.jl:1038
julia> methods(â)
# 3 methods for generic function "â" from Base:
[1] â(f, g)
@ operators.jl:1053
[2] â(f, g, h...)
@ operators.jl:1054
[3] â(f)
@ operators.jl:1052
Maybe Iâll get around to adding n-ary versions of ComposedFunction
one of these days. If someone else gets to it before me, even better.
julia> sin â cos â tan === ComposedFunction(ComposedFunction(sin, cos), tan)
true
julia> sin â cos â tan === foldl(ComposedFunction, (sin, cos, tan))
true
Of course. But why is â
privileged to not require a fold? My point is that there are many Unicode definitions like const â» = xor
but â
does not follow this pattern.
Probably because the need for it never occured to anyone: ComposedFunction
was viewed as the lower-level building block and no one saw a need for more high-level constructor methods, since in the real world almost nobody calls it directly. Indeed, if you use JuliaHub to search the thousands of Julia packages for usage of ComposedFunction
, people are mostly using it for dispatch. There only seem to be two three instances of anyone calling it directly: one line in InverseFunctions.jl, one in Bijectors.jl, and one in FunctionChains.jl, which call 2-arg and 1-arg versions respectively â in each case, this occurs in methods that are overloaded for ::ComposedFunction
arguments, where they maybe wanted to call the low-level constructor explicitly to clarify that the result is the same as the argument type. (This doesnât exactly speak to a burning desire for â
synonyms, either â â
is used directly much more often than ComposedFunction
.)
That being said, in retrospect defining . On the other hand, const â = ComposedFunction
would have made a lot a sense too (and can probably still be done?)â
has the property that the 1-ary method is the identity â(f) === f
, and the 0-ary method could arguably return identity
(though currently this is a MethodError
â a bug? by choice), whereas you would want a constructor ComposedFunction(...)
to always return a ComposedFunction
instance.
What would you return for n == 1
and n == 0
? I guess you could just define it for n â„ 2
, but then it is still distinct from â
.
Indeed, a difference with ComposedFunctions
is that itâs a type and therefore should be treated as a constructor. It certainly could return identity
for zero arguments and the input for 1 argument rather than a ComposedFunction
(this sort of not-construction is uncommon but I donât think literally unprecedented), but that would be another debate.
More likely, I would probably just replace the definitions of â
with something like compose
and then set const â = compose
like the others. But I havenât been in a situation where I couldnât just copy-paste â
from a REPL so this has never risen high on my list. Especially since there would be some bikeshedding to resolve with the written name. Itâs never caused me problems, but it is something I took note of since I usually avoid Unicode when convenient.
This is super cool, thanks for the link!
The only saving grace is that it is very possible to mostly opt-out of this madness by simply not using weird characters: There exist very few serious projects that expose their APIs in a unicode-only way.
The primary remaining pain-points for coding are the missing infix xor, and the fact that Base
/ stdlib has some gratuitous uses of things like \in<TAB>
or \le<TAB>
(sucks for copy-paste-adapt cycles if you have a non-unicode policy for your projects). If youâre coding in julia, you will read a lot of Base / stdlib code, more than e.g. java devs will need to real jdk libs, due to documentation verbosity.
The other pain-point is that interaction with non-serious projects like discourse posts or slack is made unnecessarily annoying.
Infix xor. I really want a multi-letter infix operator for that.
I very strongly disagree with your framing that this is a technical problem.
The fundamental issue is a human one: You cannot subvocalize or vocally communicate an unknown/unfamiliar glyph.
It is very difficult for humans to visually distinguish and short-time-remember words composed of unfamiliar glyphs. For this reason, one often transliterates (not translates!) such words, in a way that is somewhat pronouncable (even if the pronounciation is completely wrong).
Imagine having two printed lists, e.g. passenger manifests, and a pen, and having to check off the intersection. Common enough workflow, and a non-computerized fallback is necessary. Imagine one is for loaded baggage and the other is for passengers that have boarded â you need to ensure that all passengers whose baggage has been loaded are actually on the plane.
And now imagine looking at a sea of various names, in their native characters (some Chinese, some Korean names, some have weird African characters you have never seen, some are Hebrew, some Arabic, some Greek, some Cyrillic, some latin). You will be utterly lost trying to figure out which are the same.
The standard solution is to transliterate all these names into some standard character set, which turns out to be latin for historical reasons. Ideally in a way that is superficially pronouncable (irregardless of whether the pronouciation is bogus), because most humans tend to employ their evolved hardware acceleration for audio handling in such tasks (inner voice / subvocalization).
This is exactly where I wanted to go. And the first step is modifying the lexer/tokenizer to accept a latin/ascii transliteration, in a way that preserves ASTs.