Difficulties with unicode arrows in function names

I think it would look nice to have up and down arrows in some function names that compute upward and downward propagating properties, but I’m getting errors when I try to use them. In my REPL, when I try

test↑(x, y) = x + y

I just get

↑ (generic function with 1 method)

as if the arrow were the whole function name.

When I try to use arrows in module functions, I get errors like

LoadError: "expected \"end\" in definition of function \"schwarzschild\""

or

LoadError: "invalid character \"⇈\" near column 23"

Is this a system problem? Any way to fix it?

is parsed as a binary operator, so you actually defined a function of variables test and (x, y) (using destructuring) that returns x+y :slight_smile:

julia> test↑(x, y) = println("$test, $x, $y")
↑ (generic function with 2 methods)

julia> 2↑(3,4)
2, 3, 4

The error you see could be the parser being confused when you don’t use as a binary operator…

That would explain the issues with ↑, but what about ⇈ or ⇑? They both cause errors. Is it just the same problem? Why the invalid character error?

Aren’t those all listed as Julia operators in

example:

julia> abc⟰def = 1
⟰ (generic function with 1 method)

julia> 123 ⟰ 456
1

The parser file linked by @pbayer shows which Unicode characters are recognized as operators. This includes but not and . Note that these characters have different Unicode categories:

julia> collect("↑⇈⇑")
3-element Array{Char,1}:
 '↑': Unicode U+2191 (category Sm: Symbol, math)
 '⇈': Unicode U+21C8 (category So: Symbol, other)
 '⇑': Unicode U+21D1 (category So: Symbol, other)

The documentation says:

Variable names must begin with a letter (A-Z or a-z), underscore, or a subset of Unicode code points greater than 00A0; in particular, Unicode character categories Lu/Ll/Lt/Lm/Lo/Nl (letters), Sc/So (currency and other symbols), and a few other letter-like characters (e.g. a subset of the Sm math symbols) are allowed. Subsequent characters may also include ! and digits (0-9 and other characters in categories Nd/No), as well as other Unicode code points: diacritics and other modifying marks (categories Mn/Mc/Me/Sk), some punctuation connectors (category Pc), primes, and a few other characters. […] Most of the Unicode infix operators (in category Sm), such as , are parsed as infix operators and are available for user-defined methods (e.g. you can use const ⊗ = kron to define as an infix Kronecker product).

So being parsed as a binary operator is expected

On the other hand it is suprising that and are not allowed as identifiers, since they have category So. Maybe file an issue?

yep, you are right:

julia> abc⇈def = 2
ERROR: syntax: invalid character "⇈" near column 4
Stacktrace:
 [1] top-level scope
   @ none:1

Thanks!

I want to use ⇑, ⇘, etc., as variable names. It looks like this should be allowed by the variable naming rules but I get an error:

julia> ⇘ = 1
ERROR: syntax: invalid character "⇘" near column 1

It doesn’t look like ⇑ or ⇘ should be parsed as an operator; neither of them occurs in JuliaLang/julia/blob/master/src/julia-parser.scm .
They are both in the category So which is supposed to be a valid character for a variable name:

julia> collect("⇑,⇘")
1-element Vector{Char}:
 '⇘': Unicode U+21D8 (category So: Symbol, other)

Is this a bug? Or am I misreading the variable name rules?

I want to use ⇑, ⇘, etc., as variable names. It looks like this should be allowed by the variable naming rules but I get an error:

julia> ⇘ = 1
ERROR: syntax: invalid character "⇘" near column 1

It doesn’t look like ⇑ or ⇘ should be parsed as an operator; neither of them occurs in JuliaLang/julia/blob/master/src/julia-parser.scm .
They are both in the category So which is supposed to be a valid character for a variable name:

julia> collect("⇑,⇘")
1-element Vector{Char}:
 '⇘': Unicode U+21D8 (category So: Symbol, other)

Is this a bug? Or am I misreading the variable name rules?

Looks like a bug to me.

julia> '⇘'
'⇘': Unicode U+21D8 (category So: Symbol, other)

Try some of the arrows in the range 0x2b00:0x2bFF - and, if you’re feeling adventurous, 0x1f800:0x1f8FF.

I don’t know where “the rules” are described (other than in the source code), but you can use Meta.isidentifier() to check whether a string is a valid identifier.

There’s a bit in the docs

“subset” is the operative word, though…?

My reading of that is that it’s a subset of Unicode code points greater than 00A0, and in particular the subset comprises all of the code points with “Unicode character categories Lu/Ll/Lt/Lm/Lo/Nl (letters), Sc/So (currency and other symbols)” and so on.

I don’t understand much of the source, but it looks like a white-list, rather than all.

Then I suppose either the source or the docs should be changed to match the other. Dunno which one ¯\(ツ)