Non-unicode versions of unicode functions in base/stdlib?

I’m looking at someone else’s code in Julia 1.0 and noticed they use the (not in) function, which doesn’t seem to have an ascii equivalent (e.g. notin). I realize that it’s possible to get the same functionality by typing ! (el in set), but it reminds me of the deprecation of @test_approx_eq in Julia 0.5 in favor of @test a ≈ b atol=ε in Julia 0.6.

I have to say that I’m a little alarmed at the existence of unicode-only functions in base. I’ve seen people raise similar concerns in various places at various times, and the answer has been something along the lines of “get a proper coding environment that supports unicode.” I often run Julia code in a cluster environment, and having to enter unusual unicode characters into scripts and packages over ssh/vim seems unnecessarily difficult.

Is there any policy on having every function and operator in base as well as the standard library have a method name that can be easily typed on any US keyboard? This would by no means precludes also having the unicode function for notational elegance and speed in less restrictive coding environments.

4 Likes

From what I understand, yes, to some degree, we generally try to always give a non-unicode name to functions, and only alias it with unicode symbols for convenience.

You can always try to evaluate the symbol to see if it is an alias to a unicode function, like for and :

julia> ≈
isapprox (generic function with 3 methods)

julia> ∈
in (generic function with 28 methods)

julia> ∉
∉ (generic function with 1 method)

You can see that both and are aliases for isapprox and in. is indeed its own function.

Could there be a notin function for to alias? Sure, but at some point you need to think about how many names you’ll end up with, and if those will start to overlap with potentially useful variable names that people might want to use. Here the decision was made to have in, and not notin.

I think that’s a fair tradeoff, you are not forced to use the unicode version if you can’t, and if you can, you get a more convenient syntax.

2 Likes

Thanks for the prompt reply!

I would argue that saving the notin name for a function different than would be very confusing, and that notin should probably exist as an alias for . I think the same could be said for any ‘plain text’ translation of unicode characters. I do appreciate that most unicode functions in base have ascii equivalents, but it would be reassuring if there were a policy that all functions in base should have an ascii name.

6 Likes

You could always make an issue about it. I don’t know how many functions there are that only have a unicode name, but I suspect it’s not a large number.

I think that the correct non-unicode version of should be !in. This already works for function notation, as !in(a, b), but not for operators. There is an open issue here:

https://github.com/JuliaLang/julia/issues/25512

8 Likes

To me is an alias for !in. We should perhaps change it to be implemented that way. A related issue is that a function like is not intended to be overloaded separately; you should only define in. So the less it is considered a separate function in its own right, the better.

Edit: I’ll add that as far as I know, every unicode name is either an alias, or a quasi-alias like this one.

10 Likes

As long as this remains the case then I’ll be a happy coder. Thanks!

Another unicode only operator:

2 Likes

N̶o̶ ̶c̶h̶a̶n̶g̶e̶ ̶o̶n̶ ̶t̶h̶i̶s̶ ̶i̶s̶s̶u̶e̶ ̶i̶t̶ ̶s̶e̶e̶m̶s̶.̶ (my misunderstanding, see below)

julia> ∉
∉ (generic function with 2 methods)

julia> ∘
∘ (generic function with 3 methods)

Apart from resisting having unicode-only operators, there is the practical matter of learning and remembering how to type some of them. On the one hand \notin is easy to remember. On the other hand \circ requires googling and reaching a place like this. Searching for “composition” or “function composition” here fails because it’s listed as “ring operator”. Note to self: ∘ is pronounced “circle” but \circle doesn’t work.

2 Likes

As mentioned above, you can use !in instead of , for example !in(1, [2,3]). To discover this, try for example @less 1 ∉ [2,3] and the same for .

Similarly for : doing @less sin ∘ cos shows this is equivalent to ComposedFunction(sin, cos).

So there are already non-Unicode function names for these Unicode operators.

Concerning the input problem: you can use ? to find how to type these symbols. For example ?∘ shows:

help?> ∘
"∘" can be typed by \circ<tab>

This is expained here in the manual.

Of course you need to have the symbol somewhere to copy-paste…

8 Likes

Thanks sudete. I didn’t know about ComposedFunction(). It is not mentioned in the manual. I also didn’t know about @less. Very useful macro! I was under the impression from the discussion above that typing the unicode at the julia> prompt was going to return the “original” name, as with

julia> ∈
in (generic function with 38 methods)

so I assumed that

julia> ∘
∘ (generic function with 3 methods)

meant there was no “original” non-unicode definition, but looking at the output of @less sin ∘ cos suggests ComposedFunction is the real McCoy. (@less ∘ won’t work though and nor will @less f ∘ g if f and g are not functions). Thanks for sharing these very useful tricks. :+1:

5 Likes