Rationale behind excluding some unicode characters from identifiers

t-bltg · March 3, 2023, 4:37pm

I’m wondering why some Unicode punctuation characters (Open Punctuation Ps, and Close Punctuation Pe) were excluded from variables names (identifiers) in this PR.

julia> ❘u❘ = abs(u)  # allowed, e.g. denote an absolute value
julia> ⟦u⟧ = 1  # why not ? my use case is denoting a jump
ERROR: syntax: invalid character "⟦" near column 1
julia> ⦃u⦄ = 1  # why not ? my use case is denoting an average
ERROR: syntax: invalid character "⦃" near column 1

jling · March 3, 2023, 4:43pm

one or more of the following:

too subtle
too confusing
too obscure (user may not know what the heck this symbol means or called or how to type it, sure, can use Julia REPL, but com’on)

Oscar_Smith · March 3, 2023, 4:48pm

The slightly more detailed answer is that the parser needs to know whether a character is part of a name or if it is an operator. (e.g. you can’t name a variable a⊂). As such we only add Unicode characters on request.

t-bltg · March 3, 2023, 4:48pm

That’s not my question. There are a lot of other Unicode characters that lead to ambiguities and still allowed (difficult to distinguish because of the current font, close to another: it’s the responsibility of the developer to use them sparingly), but I’m interested in knowing why these specific characters were excluded.

t-bltg · March 3, 2023, 4:56pm

Thanks, so they might be reserved for future operator usage (conservative choice).
So what would be the criterion for such a request to be accepted ?

annoporci · March 3, 2023, 4:57pm

If they’re added on requests, request it: if it is rejected, you’ll know why!

Oscar_Smith · March 3, 2023, 5:38pm

yeah. Just open an issue and if there aren’t problems someone will probably add them to the list.

jar1 · March 3, 2023, 6:38pm

I want to use various fancy brackets as constructors for fancy container objects, so I wouldn’t want them to be in identifiers, just like [](){} aren’t in identifiers.

t-bltg · March 3, 2023, 6:39pm

Do you have an example ?

jar1 · March 3, 2023, 6:51pm

For example, if I could define

⦃1,2,2,3⦄ == MultiSet([1,2,2,3])

t-bltg · March 3, 2023, 6:52pm

I’ve opened [FR] Allow more Unicode characters from `Ps` and `Pe` categories for identifiers · Issue #48885 · JuliaLang/julia · GitHub and Allow more Unicode characters from `Ps` and `Pe` categories for identifiers by t-bltg · Pull Request #48886 · JuliaLang/julia · GitHub, to be discussed.

Topic		Replies	Views
Invalid unicode variable General Usage	3	1020	March 3, 2018
Issue with using Unicode in symbol names New to Julia question	8	385	May 30, 2024
Setminus unicode operator New to Julia	2	1933	December 4, 2018
Why is Greek ano teleia a valid identifier character? Internals & Design syntax	8	1531	July 18, 2018
Why do unicode arrows behave unexpectedly for variable names? New to Julia question	1	409	March 14, 2021

Rationale behind excluding some unicode characters from identifiers

Related topics