Difficulties with unicode arrows in function names

markmbaum · March 22, 2021, 4:59pm

I think it would look nice to have up and down arrows in some function names that compute upward and downward propagating properties, but I’m getting errors when I try to use them. In my REPL, when I try

test↑(x, y) = x + y

I just get

↑ (generic function with 1 method)

as if the arrow were the whole function name.

When I try to use arrows in module functions, I get errors like

LoadError: "expected \"end\" in definition of function \"schwarzschild\""

or

LoadError: "invalid character \"⇈\" near column 23"

Is this a system problem? Any way to fix it?

sijo · March 22, 2021, 5:03pm

↑ is parsed as a binary operator, so you actually defined a function of variables test and (x, y) (using destructuring) that returns x+y

julia> test↑(x, y) = println("$test, $x, $y")
↑ (generic function with 2 methods)

julia> 2↑(3,4)
2, 3, 4

The error you see could be the parser being confused when you don’t use ↑ as a binary operator…

markmbaum · March 22, 2021, 5:05pm

That would explain the issues with ↑, but what about ⇈ or ⇑? They both cause errors. Is it just the same problem? Why the invalid character error?

pbayer · March 22, 2021, 5:39pm

Aren’t those all listed as Julia operators in

github.com

JuliaLang/julia/blob/master/src/julia-parser.scm

;; Operator precedence table, lowest at top

; for most operators X there is a .X "elementwise" equivalent
(define (add-dots ops) (append! ops (map (lambda (op) (symbol (string "." op))) ops)))

(define prec-assignment
  (append! (add-dots '(= += -= −= *= /= //= |\\=| ^= ÷= %= <<= >>= >>>= |\|=| &= ⊻= ≔ ⩴ ≕))
           (add-dots '(~))
           '(:= $=)))
;; comma - higher than assignment outside parentheses, lower when inside
(define prec-pair (add-dots '(=>)))
(define prec-conditional '(?))
(define prec-arrow       (add-dots '(← → ↔ ↚ ↛ ↞ ↠ ↢ ↣ ↦ ↤ ↮ ⇎ ⇍ ⇏ ⇐ ⇒ ⇔ ⇴ ⇶ ⇷ ⇸ ⇹ ⇺ ⇻ ⇼ ⇽ ⇾ ⇿ ⟵ ⟶ ⟷ ⟹ ⟺ ⟻ ⟼ ⟽ ⟾ ⟿ ⤀ ⤁ ⤂ ⤃ ⤄ ⤅ ⤆ ⤇ ⤌ ⤍ ⤎ ⤏ ⤐ ⤑ ⤔ ⤕ ⤖ ⤗ ⤘ ⤝ ⤞ ⤟ ⤠ ⥄ ⥅ ⥆ ⥇ ⥈ ⥊ ⥋ ⥎ ⥐ ⥒ ⥓ ⥖ ⥗ ⥚ ⥛ ⥞ ⥟ ⥢ ⥤ ⥦ ⥧ ⥨ ⥩ ⥪ ⥫ ⥬ ⥭ ⥰ ⧴ ⬱ ⬰ ⬲ ⬳ ⬴ ⬵ ⬶ ⬷ ⬸ ⬹ ⬺ ⬻ ⬼ ⬽ ⬾ ⬿ ⭀ ⭁ ⭂ ⭃ ⭄ ⭇ ⭈ ⭉ ⭊ ⭋ ⭌ ￩ ￫ ⇜ ⇝ ↜ ↝ ↩ ↪ ↫ ↬ ↼ ↽ ⇀ ⇁ ⇄ ⇆ ⇇ ⇉ ⇋ ⇌ ⇚ ⇛ ⇠ ⇢ ↷ ↶ ↺ ↻ --> <-- <-->)))
(define prec-lazy-or     (add-dots '(|\|\||)))
(define prec-lazy-and    (add-dots '(&&)))
(define prec-comparison
  (append! '(in isa)
           (add-dots '(> < >= ≥ <= ≤ == === ≡ != ≠ !== ≢ ∈ ∉ ∋ ∌ ⊆ ⊈ ⊂ ⊄ ⊊ ∝ ∊ ∍ ∥ ∦ ∷ ∺ ∻ ∽ ∾ ≁ ≃ ≂ ≄ ≅ ≆ ≇ ≈ ≉ ≊ ≋ ≌ ≍ ≎ ≐ ≑ ≒ ≓ ≖ ≗ ≘ ≙ ≚ ≛ ≜ ≝ ≞ ≟ ≣ ≦ ≧ ≨ ≩ ≪ ≫ ≬ ≭ ≮ ≯ ≰ ≱ ≲ ≳ ≴ ≵ ≶ ≷ ≸ ≹ ≺ ≻ ≼ ≽ ≾ ≿ ⊀ ⊁ ⊃ ⊅ ⊇ ⊉ ⊋ ⊏ ⊐ ⊑ ⊒ ⊜ ⊩ ⊬ ⊮ ⊰ ⊱ ⊲ ⊳ ⊴ ⊵ ⊶ ⊷ ⋍ ⋐ ⋑ ⋕ ⋖ ⋗ ⋘ ⋙ ⋚ ⋛ ⋜ ⋝ ⋞ ⋟ ⋠ ⋡ ⋢ ⋣ ⋤ ⋥ ⋦ ⋧ ⋨ ⋩ ⋪ ⋫ ⋬ ⋭ ⋲ ⋳ ⋴ ⋵ ⋶ ⋷ ⋸ ⋹ ⋺ ⋻ ⋼ ⋽ ⋾ ⋿ ⟈ ⟉ ⟒ ⦷ ⧀ ⧁ ⧡ ⧣ ⧤ ⧥ ⩦ ⩧ ⩪ ⩫ ⩬ ⩭ ⩮ ⩯ ⩰ ⩱ ⩲ ⩳ ⩵ ⩶ ⩷ ⩸ ⩹ ⩺ ⩻ ⩼ ⩽ ⩾ ⩿ ⪀ ⪁ ⪂ ⪃ ⪄ ⪅ ⪆ ⪇ ⪈ ⪉ ⪊ ⪋ ⪌ ⪍ ⪎ ⪏ ⪐ ⪑ ⪒ ⪓ ⪔ ⪕ ⪖ ⪗ ⪘ ⪙ ⪚ ⪛ ⪜ ⪝ ⪞ ⪟ ⪠ ⪡ ⪢ ⪣ ⪤ ⪥ ⪦ ⪧ ⪨ ⪩ ⪪ ⪫ ⪬ ⪭ ⪮ ⪯ ⪰ ⪱ ⪲ ⪳ ⪴ ⪵ ⪶ ⪷ ⪸ ⪹ ⪺ ⪻ ⪼ ⪽ ⪾ ⪿ ⫀ ⫁ ⫂ ⫃ ⫄ ⫅ ⫆ ⫇ ⫈ ⫉ ⫊ ⫋ ⫌ ⫍ ⫎ ⫏ ⫐ ⫑ ⫒ ⫓ ⫔ ⫕ ⫖ ⫗ ⫘ ⫙ ⫷ ⫸ ⫹ ⫺ ⊢ ⊣ ⟂ ⫪ ⫫ <: >:))))
(define prec-pipe<       '(|.<\|| |<\||))
(define prec-pipe>       '(|.\|>| |\|>|))

This file has been truncated. show original

example:

julia> abc⟰def = 1
⟰ (generic function with 1 method)

julia> 123 ⟰ 456
1

sijo · March 22, 2021, 6:41pm

The parser file linked by @pbayer shows which Unicode characters are recognized as operators. This includes ↑ but not ⇈ and ⇑. Note that these characters have different Unicode categories:

julia> collect("↑⇈⇑")
3-element Array{Char,1}:
 '↑': Unicode U+2191 (category Sm: Symbol, math)
 '⇈': Unicode U+21C8 (category So: Symbol, other)
 '⇑': Unicode U+21D1 (category So: Symbol, other)

The documentation says:

Variable names must begin with a letter (A-Z or a-z), underscore, or a subset of Unicode code points greater than 00A0; in particular, Unicode character categories Lu/Ll/Lt/Lm/Lo/Nl (letters), Sc/So (currency and other symbols), and a few other letter-like characters (e.g. a subset of the Sm math symbols) are allowed. Subsequent characters may also include ! and digits (0-9 and other characters in categories Nd/No), as well as other Unicode code points: diacritics and other modifying marks (categories Mn/Mc/Me/Sk), some punctuation connectors (category Pc), primes, and a few other characters. […] Most of the Unicode infix operators (in category Sm), such as ⊕ , are parsed as infix operators and are available for user-defined methods (e.g. you can use const ⊗ = kron to define ⊗ as an infix Kronecker product).

So ↑ being parsed as a binary operator is expected

On the other hand it is suprising that ⇈ and ⇑ are not allowed as identifiers, since they have category So. Maybe file an issue?

pbayer · March 22, 2021, 6:59pm

yep, you are right:

julia> abc⇈def = 2
ERROR: syntax: invalid character "⇈" near column 4
Stacktrace:
 [1] top-level scope
   @ none:1

Thanks!

brianguenter · June 28, 2021, 6:24am

I want to use ⇑, ⇘, etc., as variable names. It looks like this should be allowed by the variable naming rules but I get an error:

julia> ⇘ = 1
ERROR: syntax: invalid character "⇘" near column 1

It doesn’t look like ⇑ or ⇘ should be parsed as an operator; neither of them occurs in JuliaLang/julia/blob/master/src/julia-parser.scm .
They are both in the category So which is supposed to be a valid character for a variable name:

julia> collect("⇑,⇘")
1-element Vector{Char}:
 '⇘': Unicode U+21D8 (category So: Symbol, other)

Is this a bug? Or am I misreading the variable name rules?

brianguenter · June 28, 2021, 6:25am

I want to use ⇑, ⇘, etc., as variable names. It looks like this should be allowed by the variable naming rules but I get an error:

julia> ⇘ = 1
ERROR: syntax: invalid character "⇘" near column 1

It doesn’t look like ⇑ or ⇘ should be parsed as an operator; neither of them occurs in JuliaLang/julia/blob/master/src/julia-parser.scm .
They are both in the category So which is supposed to be a valid character for a variable name:

julia> collect("⇑,⇘")
1-element Vector{Char}:
 '⇘': Unicode U+21D8 (category So: Symbol, other)

Is this a bug? Or am I misreading the variable name rules?

jzr · June 28, 2021, 6:28am

Looks like a bug to me.

julia> '⇘'
'⇘': Unicode U+21D8 (category So: Symbol, other)

cormullion · June 28, 2021, 7:58am

Try some of the arrows in the range 0x2b00:0x2bFF - and, if you’re feeling adventurous, 0x1f800:0x1f8FF.

I don’t know where “the rules” are described (other than in the source code), but you can use Meta.isidentifier() to check whether a string is a valid identifier.

jzr · June 28, 2021, 8:04am

There’s a bit in the docs

cormullion · June 28, 2021, 8:07am

“subset” is the operative word, though…?

jzr · June 28, 2021, 8:14am

My reading of that is that it’s a subset of Unicode code points greater than 00A0, and in particular the subset comprises all of the code points with “Unicode character categories Lu/Ll/Lt/Lm/Lo/Nl (letters), Sc/So (currency and other symbols)” and so on.

cormullion · June 28, 2021, 8:18am

I don’t understand much of the source, but it looks like a white-list, rather than all.

jzr · June 28, 2021, 8:27am

Then I suppose either the source or the docs should be changed to match the other. Dunno which one ¯\(ツ)/¯

Topic		Replies	Views
Why do unicode arrows behave unexpectedly for variable names? New to Julia question	1	409	March 14, 2021
Issue with using Unicode in symbol names New to Julia question	8	385	May 30, 2024
Invalid unicode variable General Usage	3	1020	March 3, 2018
Assigning a function to unicode symbols General Usage	1	365	March 13, 2020
Rationale behind excluding some unicode characters from identifiers Internals & Design	10	398	March 3, 2023

Difficulties with unicode arrows in function names

Related topics