1 + 'a' = 'b'

cjdoris · March 16, 2022, 3:52pm

No it doesn’t, somewhere further down the sequence is ..., 'α', 'β', ... as defined by Unicode.

josuagrw · March 16, 2022, 3:53pm

I think this goes to the core of my line of argument: 'a' + 1=='b' is implicit only if we consider 'a' and 'b' as representations of the letters a and b.

It is explicit if we consider 'a' and 'b' as representations of Unicode code points.

cjdoris · March 16, 2022, 3:55pm

Types aren’t iterable. You can’t do iterate(Int, 1234).

DNF · March 16, 2022, 3:55pm

Whatever it is, it’s not a cast or a promotion.

Sukera · March 16, 2022, 3:56pm

Yes, it does. You’re talking about the order, which is not the encoding. The OP was talking about interpreting 'a' + 1 as adding 1 to the encoding (in ASCII) which just happens to match up for some values, but not all - namely multibyte characters/codepoints, which don’t have this property anymore at e.g. multibyte boundaries, where the next character gets an additional byte.

josuagrw · March 16, 2022, 4:01pm

I think this is unnecessarily pedantic. But okay, from now on I will say “code points”. Let’s see when the next person finds a flaw with that word choice…

Sukera · March 16, 2022, 4:01pm

On top of all of this, there are also explicit noncharacters (see wikipedia):

A small set of code points are guaranteed never to be used for encoding characters, although applications may make use of these code points internally if they wish. There are sixty-six of these noncharacters: U+FDD0–U+FDEF and any code point ending in the value FFFE or FFFF (i.e., U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, … U+10FFFE, U+10FFFF).

So not even the interpretation of “add 1 to the codepoint” holds.

oheil · March 16, 2022, 4:02pm

There is nothing like we consider
We can agree here or in the docs for anything and make it explicit with that.
Still this doesn’t solve the OPs problem, that somebody considered something else and stumbled about the facts.

The important problem here is, that in some code i see:
'a' + 1
and it’s not clear what does it mean from itself. I have to consult the docs and, good luck, it is explicitly mentioned there. In this case. Well done, Julia docs people!

My opinion, in this special case, is, that we should have a function
nextchar('a')
for this, like we have nextfloat, because 'a' + 1 is surprising and e.g. '1' + 1 is different in different languages.

But anyhow, I don’t have any strong feelings about this. But mathematical stringency will not solve this and doesn’t help.
The pragmatic path of Stefan to open a PR for 2.0 is it.

cjdoris · March 16, 2022, 4:03pm

Mathematical formalism is how good programming languages are made. In Julia:

+(::T,::T) is used for (usually commutative) group operations (e.g. 1+2 or [1,1]+[2,2])
*(::T,::T) is used for monoid operations (in particular the multiplicative operation when T forms a ring) (e.g. 1*2, [1]*[2] or "foo"*"bar")
+(::T,::S) is used for group actions (which generalise the notion of group operation) (e.g. 'a'+1, Ptr{Cvoid}(3)+2 or 1.2+3)
- and / are the corresponding inverses.

Since group actions generalise group operations, you can view 1+1.0 and 1.0+1.0 both as either a group operation or an action and you get the same conclusion.

mbauman · March 16, 2022, 4:03pm

Sukera, no. Even “non characters” are Chars:

julia> '\UFDD0'
'\ufdd0': Unicode U+FDD0 (category Cn: Other, not assigned)

Julia’s Char is not UTF8. It’s a unicode code point. That’s why it prints out the unicode code point.

oheil · March 16, 2022, 4:04pm

This is not important here.
That it is implicit is the reason for the arguing.

cjdoris · March 16, 2022, 4:07pm

The OP never mentioned encodings or ASCII.

oheil · March 16, 2022, 4:13pm

I don’t know, probably don’t understand it the right way, but than my conclusion is:

1.0 + 1.0 = 2.0 the group operation
and
1.0 + 1 = nextfloat(1.0) the group action
for Float64.

are not the same.
Like in
'a' + 1 = nextchar('a') = 'b'
or
next_codepoint instead of nextchar.

cjdoris · March 16, 2022, 4:20pm

I assume everyone here is ok with Date(2022, 3, 16) + Day(1) returning Date(2022, 3, 17)?

This is an addition where the operands are of different types and there is no promotion.

The reason it makes sense is because there is a sensible notion of subtraction between different dates, and we can measure it in days - the difference between tomorrow and today is one day.

There is no particular need for the difference between two things to be of the same type as the thing itself. Sometimes they are the same, such as the difference between integers is also an integer.

Oftentimes they are different. The difference between dates is measured in days. The difference between cities is measured in miles. In Julia, the difference between characters is an integer.

DNF · March 16, 2022, 4:23pm

Implicit what? I cannot be just ‘implicit’.

Sukera · March 16, 2022, 4:26pm

Yes, I’m aware! However, the section I’ve quoted is not from the UTF-8 spec - Unicode itself says these are noncharacters. The very next part of the text even acknowledges that this is often ignored:

Like surrogates, the rule that these cannot be used is often ignored, although the operation of the byte order mark assumes that U+FFFE will never be the first code point in a text.

As such, while julia can represent them, you can’t (shouldn’t?) expect to be able to do anything with them and (depending on how closely you want to follow the spec) can’t necessarily assume that it exists at all.

To be quite clear, I agree with all of you that the + 1 API is horrible and what is “meant” is “give me the next codepoint I can reasonably do something with”, even though the result may not be possible to actually encode in a valid unicode string (however encoded).

As mentioned above though, we’re stuck with it for now.

josuagrw · March 16, 2022, 4:28pm

I don’t think it is horrible. When we think of Chars as code points, the operation makes perfect sense.

mbauman · March 16, 2022, 4:32pm

Julia’s Char can explicitly — as a design goal — hold arbitrary data including potentially invalid unicode points.

julia> Char(0x11FFFF)
'\U11ffff': Unicode U+11FFFF (category In: Invalid, too high)

Doing anything different removes your ability to programmatically fix encoding issues in your data.

DNF · March 16, 2022, 4:53pm

Yeah, I don’t know if it’s exactly horrible. One should be able to express both something like nextchar, and measuring distance between characters. +(::Char, ::Int) and -(::Char, ::Char) are a symmetric pair that is both convenient and respect algebraic manipulation.

oheil · March 16, 2022, 5:07pm

implicit in contrast to explicit.
In
'a' + 1
there happens some implicit magic, which is hidden to the programmer who types or read this code snippet. He can not be sure, what the result is, without trying it or reading the docs. In Julia '1' + 1 is '2', but not in other languages.
This is meant by implicit.

The explicit version of the same result is:
Char(Int('a')+1)
where everything is explicitly written out and you can guess the outcome easily, still you can’t be sure.

I say guess and result by purpose, because it’s not the explicit code version for 'a' + 1. It just produces the same result in the case of ‘a’ and it may fail for other characters.

The equivalent explicit code version would be:

Char(Int32('a') + Int32(1))

So, it’s not important that it is a cast, a promotion or a special function for the problem. The source of problem here is that some hidden magic (implicitly) happens.

This is often not a problem at all, as for example in

Date(2022, 3, 16) + Day(1)

as, from the types, it’s clear enough, that probably the next day as a Date is the result.

For a Julia beginner this

"1" * "2"

can be a problem and he may has to check the docs.
Operator * is concatenation for String in Julia, one has to know it.

Now

'1' + 1

is what?
It’s up the individual on how to comprehend it. For some it’s clear, for some others not.
Hence

nextchar('1')

would be my suggestion and remove '1' + 1 and my argument is taste in this very special case of Char.

Topic		Replies	Views
X[a,1]'b[a]==sum(x[a,1].b[a])? General Usage	2	283	June 7, 2021
Using an `Int` as a `Num` General Usage symbolics	4	300	February 28, 2024
String conversion from Symbol with Unicode does not yield a string, which is intended to be the same New to Julia question , bug	6	767	December 5, 2020
Overloading ≫ yields ERROR: TypeError: non-boolean used in boolean context General Usage question	3	432	July 6, 2021
Why a === b returns TRUE when a and b are single elements (numbers or strings) New to Julia question	6	590	October 10, 2021

1 + 'a' = 'b'

Related topics