1 + 'a' = 'b'

josuagrw · March 16, 2022, 12:53pm

Yes, I fully agree.

josuagrw · March 16, 2022, 12:54pm

You can call it encoding if you like. It does not change the following:

When we do 'a' + 1 we are manipulating the encoding not the letter itself. Because it does not make sense to increment a letter. It does make sense to increment an encoding.

DNF · March 16, 2022, 1:31pm

In that perspective, you can never manipulate any object in and of itself, only its encoding. Then everything is a ‘pointer’. I don’t see how this is useful.

Anyway, my point was that ‘pointer’ and ‘encoding’ are not the same, and not really useful to conflate.

Well, ‘b’ comes after ‘a’, so why not?

BTW: I’m not arguing that 'a' + 1 ought to work, I’m not sure what I think about that.

josuagrw · March 16, 2022, 2:45pm

Because this logic extends well beyond the alphabet. After 9 comes :

julia> '9'+1
':': ASCII/Unicode U+003A (category Po: Punctuation, other)

This cannot be explained without reference to encoding.

I have to look at the encoding to understand why I get this specific result.

josuagrw · March 16, 2022, 2:48pm

I disagree. When I do

julia> "hello" * " world"
"hello world"

I am not operating on the level of the encoding of the strings. It’s completely hidden from me. I am purely operating with the values which are encoded, namely content of the two strings.

The implementation details of the encoding do not determine the result here.

StefanKarpinski · March 16, 2022, 2:53pm

If someone wants this to change in Julia 2.0 they should open an issue about it on GitHub.

DNF · March 16, 2022, 3:07pm

This is consistent with the lexicographic order of characters, independent of their encoding.

Anyway, as I said, I don’t care too much about whether Char + Int should or shouldn’t work, but that characters aren’t pointers, and aren’t like pointers.

DNF · March 16, 2022, 3:09pm

I was addressing your conflation of ‘encoding’ and ‘pointer’. This is getting really far afield.

Sukera · March 16, 2022, 3:20pm

Sure it can - a Char is not a number, so what did you expect to happen instead? Having '10'? The choice of “next character” is arbitrary and + 1 (though implicit casting) happens to be the choice of syntax for “give me the next character”. The choice of what that next character is doesn’t have to make any semantic sense, as there is no such thing in general.

StevenWhitaker · March 16, 2022, 3:24pm

I think having 'a' + 1 work makes sense. When I type 'a' (or any Char) at the REPL and hit enter, it displays a number, e.g., U+0061 in

julia> 'a'
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

Once I see that, it makes sense that adding an integer to a Char will change that number by the corresponding amount, e.g., U+0161 is the result of adding 0x100 to 'a':

julia> 'a' + 0x100
'š': Unicode U+0161 (category Ll: Letter, lowercase)

StevenWhitaker · March 16, 2022, 3:32pm

I think that might be @josuagrw’s point: The fact that '9' + 1 results in ':' is because 1 is added to the ASCII/Unicode encoding of '9'.

cjdoris · March 16, 2022, 3:32pm

This is nothing to do with conversion between integers and characters. If 'a'+1 were promoting both to an Int the result would be an Int and if it were promoting to Char then it would fail because +(::Char,::Char) is not defined.

A Char represents a Unicode character. Unicode characters form an ordered sequence (defined by the Unicode spec), part of which is ..., 'a', 'b', 'c', ... and so 'a'+1 gets the next item in that sequence. This is completely analogous to:

Pointers: these form a sequence ..., Ptr{Cvoid}(3), Ptr{Cvoid}(4), ... and so Ptr{Cvoid}(3)+1 gets the next item in the sequence.
Integers: these form a sequence ..., 5, 6, ... and so 5+1 gets the next item in the sequence.

These types being an ordered sequence has other useful semantics, like ordering ('a'<'b', Ptr{Cvoid}(3)<Ptr{Cvoid}(4) and 5-6 are all true) and differencing ('b'-'a', Ptr{Cvoid}(4)-Ptr{Cvoid}(3) and 6-5 are all 1).

josuagrw · March 16, 2022, 3:34pm

That is exactly my point, thank you for translating.

cjdoris · March 16, 2022, 3:42pm

For the mathematically inclined, +(::Char,::Int) forms a group action of the (additive) group of integers on the set of characters.

Similarly +(::Ptr{T},::Int) is a group action of integers on pointers and +(::Int,::Int) is the usual group action of integers on themselves, namely the (additive) group operation on integers. So this is all consistent and legit from a mathematical point of view.

Sukera · March 16, 2022, 3:46pm

Yes, I’m aware of that. My point is that this has nothing to do with the encoding, as that argument breaks down as soon as you have multibyte characters (same goes for the pointer argument!). That’s why I’ve been very careful to say “next character” instead (which noone has mentioned so far, as far as I can tell).

Viewed from that perspective, one could also argue that iterate(Char, 1234) should give the 1234th character in UTF-8, though I think hardly anyone would think that as a sensible API either.

The point made by @kristoffer.carlsson and @StefanKarpinski above still stands - this is the API for “next character” we currently have and we can’t get rid of it until 2.0. The problem is known and issues for it exist, so arguing about it being horrible/bad for students/whatever else is futile, as we have to live with it for now.

mbauman · March 16, 2022, 3:46pm

There’s also subtraction defined between characters to give you the integer difference in code points. This can be useful, if cutesy:

julia> rot(str, n) = String((collect(str) .- 'a' .+ n) .% 26 .+ 'a')
rot (generic function with 1 method)

julia> rot("hello", 13)
"uryyb"

oheil · March 16, 2022, 3:47pm

which doesn’t really help for the original question.
But it generates more questions, like: should we than treat Float64 as sequence in this mathematical sense and distinguish between 1.0 + 1 and 1.0 + 1.0. I don’t think so.

The question is about implicit casts, are they error prone or can they be tolerated in special circumstances.

josuagrw · March 16, 2022, 3:48pm

What happens when you add 1 to a multibyte character?

mbauman · March 16, 2022, 3:49pm

This isn’t an implicit cast, nor is it promotion (like your numeric 1 vs 1.0 example). It’s just that we’ve chosen a meaning for what happens when you add an integer and a char together.

oheil · March 16, 2022, 3:51pm

It is implicit.
Admitted not a cast, but still implicit. And not explicitly written by the user/programmer as in Char(Int('a')+1).

Topic		Replies	Views
X[a,1]'b[a]==sum(x[a,1].b[a])? General Usage	2	283	June 7, 2021
Using an `Int` as a `Num` General Usage symbolics	4	298	February 28, 2024
String conversion from Symbol with Unicode does not yield a string, which is intended to be the same New to Julia question , bug	6	767	December 5, 2020
Overloading ≫ yields ERROR: TypeError: non-boolean used in boolean context General Usage question	3	432	July 6, 2021
Why a === b returns TRUE when a and b are single elements (numbers or strings) New to Julia question	6	590	October 10, 2021

1 + 'a' = 'b'

Related topics