Difficulties with unicode arrows in function names

I think it would look nice to have up and down arrows in some function names that compute upward and downward propagating properties, but I’m getting errors when I try to use them. In my REPL, when I try

test↑(x, y) = x + y

I just get

↑ (generic function with 1 method)

as if the arrow were the whole function name.

When I try to use arrows in module functions, I get errors like

LoadError: "expected \"end\" in definition of function \"schwarzschild\""

or

LoadError: "invalid character \"⇈\" near column 23"

Is this a system problem? Any way to fix it?

is parsed as a binary operator, so you actually defined a function of variables test and (x, y) (using destructuring) that returns x+y :slight_smile:

julia> test↑(x, y) = println("$test, $x, $y")
↑ (generic function with 2 methods)

julia> 2↑(3,4)
2, 3, 4

The error you see could be the parser being confused when you don’t use as a binary operator…

4 Likes

That would explain the issues with ↑, but what about ⇈ or ⇑? They both cause errors. Is it just the same problem? Why the invalid character error?

Aren’t those all listed as Julia operators in

example:

julia> abc⟰def = 1
⟰ (generic function with 1 method)

julia> 123 ⟰ 456
1
6 Likes

The parser file linked by @pbayer shows which Unicode characters are recognized as operators. This includes but not and . Note that these characters have different Unicode categories:

julia> collect("↑⇈⇑")
3-element Array{Char,1}:
 '↑': Unicode U+2191 (category Sm: Symbol, math)
 '⇈': Unicode U+21C8 (category So: Symbol, other)
 '⇑': Unicode U+21D1 (category So: Symbol, other)

The documentation says:

Variable names must begin with a letter (A-Z or a-z), underscore, or a subset of Unicode code points greater than 00A0; in particular, Unicode character categories Lu/Ll/Lt/Lm/Lo/Nl (letters), Sc/So (currency and other symbols), and a few other letter-like characters (e.g. a subset of the Sm math symbols) are allowed. Subsequent characters may also include ! and digits (0-9 and other characters in categories Nd/No), as well as other Unicode code points: diacritics and other modifying marks (categories Mn/Mc/Me/Sk), some punctuation connectors (category Pc), primes, and a few other characters. […] Most of the Unicode infix operators (in category Sm), such as , are parsed as infix operators and are available for user-defined methods (e.g. you can use const ⊗ = kron to define as an infix Kronecker product).

So being parsed as a binary operator is expected

On the other hand it is suprising that and are not allowed as identifiers, since they have category So. Maybe file an issue?

3 Likes

yep, you are right:

julia> abc⇈def = 2
ERROR: syntax: invalid character "⇈" near column 4
Stacktrace:
 [1] top-level scope
   @ none:1

Thanks!

1 Like

I want to use ⇑, ⇘, etc., as variable names. It looks like this should be allowed by the variable naming rules but I get an error:

julia> ⇘ = 1
ERROR: syntax: invalid character "⇘" near column 1

It doesn’t look like ⇑ or ⇘ should be parsed as an operator; neither of them occurs in JuliaLang/julia/blob/master/src/julia-parser.scm .
They are both in the category So which is supposed to be a valid character for a variable name:

julia> collect("⇑,⇘")
1-element Vector{Char}:
 '⇘': Unicode U+21D8 (category So: Symbol, other)

Is this a bug? Or am I misreading the variable name rules?

I want to use ⇑, ⇘, etc., as variable names. It looks like this should be allowed by the variable naming rules but I get an error:

julia> ⇘ = 1
ERROR: syntax: invalid character "⇘" near column 1

It doesn’t look like ⇑ or ⇘ should be parsed as an operator; neither of them occurs in JuliaLang/julia/blob/master/src/julia-parser.scm .
They are both in the category So which is supposed to be a valid character for a variable name:

julia> collect("⇑,⇘")
1-element Vector{Char}:
 '⇘': Unicode U+21D8 (category So: Symbol, other)

Is this a bug? Or am I misreading the variable name rules?

Looks like a bug to me.

julia> '⇘'
'⇘': Unicode U+21D8 (category So: Symbol, other)

Try some of the arrows in the range 0x2b00:0x2bFF - and, if you’re feeling adventurous, 0x1f800:0x1f8FF.

I don’t know where “the rules” are described (other than in the source code), but you can use Meta.isidentifier() to check whether a string is a valid identifier.

1 Like

There’s a bit in the docs

“subset” is the operative word, though…?

My reading of that is that it’s a subset of Unicode code points greater than 00A0, and in particular the subset comprises all of the code points with “Unicode character categories Lu/Ll/Lt/Lm/Lo/Nl (letters), Sc/So (currency and other symbols)” and so on.

I don’t understand much of the source, but it looks like a white-list, rather than all.

Then I suppose either the source or the docs should be changed to match the other. Dunno which one ¯\(ツ)