1 + 'a' = 'b'

julia> 1 + 'a'
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)

This is ridiculous. How is this not an error? There should be—and maybe there is—an increment function for characters. I know this aggressive behind-the-scenes overloading is a much-loved feature of Julia but to me it is the complete opposite of type safety. Sure, I can type-safe my own functions—until I forget. Sorry, not trying to be flamey, just my 2c.

This might very well be a legacy from languages where char is a type of integer, usually the smallest non-boolean integer being a single byte, and people are used to being able to do arithmetic on chars for that reason. Some languages also have unsigned chars similar to other unsigned integers. You can also cast a char to its integer value using

julia> Int('a')
97

but interestingly you cannot compare the two

julia> 'a' > 90
ERROR: MethodError: no method matching isless(::Int64, ::Char)
1 Like

This is documented behavior which is explicitly implemented here. You can, of course, commit type piracy, but this may break many things in awful ways.

julia> import Base: +

julia> +(a::Char, b::Int) = error("ridiculous!")
+ (generic function with 209 methods)

julia> 1 + 'a'
ERROR: ridiculous!
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:33
 [2] +(a::Char, b::Int64)
   @ Main .\REPL[2]:1
 [3] +(x::Int64, y::Char)
   @ Base .\char.jl:247
 [4] top-level scope
   @ REPL[3]:1

julia> 'b' in 'a':'z'
ERROR: ridiculous!
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:33
 [2] +(a::Char, b::Int64)
   @ Main .\REPL[2]:1
 [3] _colon(start::Char, step::Int64, stop::Char)
   @ Base .\range.jl:45
 [4] (::Colon)(start::Char, step::Int64, stop::Char)
   @ Base .\range.jl:40
 [5] (::Colon)(start::Char, stop::Char)
   @ Base .\range.jl:7
 [6] top-level scope
   @ REPL[6]:1
5 Likes

I think almost everyone agree that implicit conversion of integers to chars and vice versa is a mistake, yes. It is something that was put in a long time ago and just slipped through the cracks and managed to get into julia 1.0.

9 Likes

If this is the case, does the issue exist already, or should someone write an issue?

I like being able to do 'a'+1. It may be a sharp tool, but I appreciate it.

2 Likes

Yes, Unexpected behavior due to implicit `convert` with `Char` and `Integer` · Issue #44410 · JuliaLang/julia · GitHub.

1 Like

You would just need to write 'a' + Char(1) instead.

4 Likes

How would that work? Adding Char to Char seems like a strange operation, moreso than adding an Int to a Char.

Seems like it should be

Char(Int('a') + 1)
8 Likes

Yeah, that’s right.

2 Likes

I think the confusion comes from what 'a' is. Many here implicitly assume that 'a' is a representation of the letter a. However, in Julia that would be :a (or "a").

Instead 'a' is actually the code point for the letter a in ASCII. It is essentially a pointer. And for a pointer, it makes sense to have integer addition defined.

1 Like

I disagree. 'a' is most definitely a representation of the letter a. If we wanted to just talk about Unicode codepoints, we would be using UInt32, since that is actually a number.

In my opinion, you are confusing the representation of a type with its semantics. Dig into a DateTime and you’ll find it’s composed of a single Int. Does that mean a point in time is really an integer? No. It really is a time point. The implementation is incidental.

4 Likes

The behaviors tell us what the thing really is. The way 'a' behaves w.r.t. integer addition tells me that the people who implemented it were thinking about it as a pointer.

Indeed it originally was an integer (not a pointer, it has nothing to do with memory). But it was decided to make it not an integer. Apparently some methods stuck around so now it’s a weird mix between a character and an integer, which is exactly the substance of the gripe of this thread.

1 Like

Do you have examples of letter-like behaviors for 'a'?

julia> print('a') # prints like a letter
a
julia> isuppercase('a') # has text-like methods
false

julia> uppercase('a')
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)

Edit: And more literally:

julia> isnumeric('a')
false

julia> isletter('a')
true
2 Likes

To be honest I don’t find these very distressing. They all fit with the notion of 'a' being a pointer to something (compare for example Vector which is also a pointer to some memory but prints the content of that memory under print).

To me this is really something I would not expect from a pointer:

julia> 'a' * 'b'
"ab"

EDIT: The isletter is weird.

I think I might be confused with what you mean by ‘a’ being a “pointer”. Char is certainly not implemented using an actual pointer or a reference, and is pass-by-value.

2 Likes

I mean “pointer” not in the sense of “pointing to some page in memory” but in the sense of “pointing to a letter in the ASCII/Unicode table”.

What are we doing with 'a' + 1? We are moving to the next entry in the ASCII/Unicode table. We are incrementing the pointer.

This is because *(::Char, ::Char) is not multiplication, but concatenation. Concatenation is something that you definitely can do with letters, but doesn’t make sense with pointers.

I don’t think the ‘pointer analogy’ is very useful. Every value has an encoding, an Int is just a series of bits with a particular interpretation, as is a Char, but that doesn’t make them pointers.

I think this is just confusing the discussion.

7 Likes