Semantics of :: in return type vs. argument type annotations

No, x = y = 1 is the same as x = (y = 1), and 1 is assigned to both variables (the sensible choice) regardless of y::Float64’s conversion, so the subexpression y = 1 must have a value of 1. Again the choice of either side wouldn’t matter if conversion didn’t happen.

1 Like

@mbauman, the problem (if there is one) is exactly that :: pretends to be a declaration-- a statement without actions-- but implicitly takes an action-- a conversion (followed by an assertion). It’s even harder to find bugs (my example earlier) if the type declaration occurs in one place, but assignment much later. In the implementation the assignment might actually be the thing taking the action, but I’m taking about the overall behavior here. So maybe my problem is with implicit conversion during assignment to a typed variable.

@Benny, I am not sure I followed exactly, but I think you were saying that :: would play different roles if used on the right hand side during assignment as opposed to in function signatures and method dispatch, even if the conversion is removed. And that’s fine from my perspective. I said earlier that I wish :: to mean only typeassert. I revise that to a wish that :: was only a declaration with no implicit conversion.

That’s settled at least. I’m happy to have a reason to abandon that syntax, because… I hate it, purely on aesthetic grounds.

So about this:

Forgive the edits, I wanted them all numbered for the ensuing discussion.

I like that there are two distinct meanings here. It breaks down neatly into L-value declarations, and R-value declarations, and this is essential for a language which is trying to have it all: dynamic semantics with gradual typing, and the ability to emit low-level-optimal code by adequately specifying types and their methods.

#1 and #2 are L-value declarations, that is, they specify a variable, and it’s much nicer that they convert when necessary. A field declared as a concrete type has a shape in memory, and in a dynamic language, the right thing to do is to try and fit a value into that shape, and throw an error if it doesn’t work. #1 is the right thing to do by analogy with #2, and with the same effect: it gives a shape to that variable, which the compiler is free to use to generate efficient code.

#0 is an R-value declaration, it specifies a value, and there are only two results: an error, or not. If the compiler can infer that the type will always be correct, the asserting code may be elided. Since Julia is dynamic, having a top type of Any, an expression may in principle resolve to any value. A type declaration of an expression narrows this type, with two consequences: the program is known to be correct with respect to that type, and the compiler is free to specialize on that basis.

Julia’s special sauce means that parameters should be treated as R values. Traditionally parameters are L valued and arguments are R valued, but in Julia, a method declaration creates a template, a form for any number of methods to take, and there’s a lovely mechanism which will specialize these into optimal code based on the R value types that method actually sees. So these declarations narrow, with the same two consequences: correctness, and compiler freedom to generate optimal code. But more than that, #4 and #5 use the type hierarchy to allow multiple methods for functions, using the declarations to decide which one applies.

Phrasing the type declaration of parameters as if they were already arguments makes sense in Julia, and it’s only in a dynamic language that having this semantic difference (convert vs assert) makes sense to begin with. If method parameters were converting of their arguments, the decision between f(a::Int) and f(a::UInt8) is less clear on f(1), since it’s certainly possible to turn that into a UInt8. Granted that this could use the type system as a tiebreaker, I firmly agree that the current behavior is the correct choice.

That leaves #3, return values. Which are values, not variables, but the :: syntax treats them the same as L-values, which is weird, because return values don’t need a shape, the only thing which needs a shape is the place they get put after a return. I think this decision was made as a convenience for numerics code (anyone who can link to the specific reasoning, I would be grateful), and it’s mostly numeric values where the distinction is a difference.

This is an inconsistency. Inconsistency isn’t a sin, and maybe this choice is worth it. It’s clearly the only one of the six where I see a use for both options, but that can’t be done with a single syntax, and I’d be happier if type signatures worked consistently within the parentheses and to the right of them.

I’m curious about the consequences of each approach when it comes time to specialize and optimize the emitted code, but that’s far beyond me at my current level of understanding. The only obvious thing is that conversion requires emitting converting code under some circumstances, and assertion the same for asserting code. If conversions give LLVM more opportunities to write fast code than assertions would, that would settle my objection.

2 Likes
1 Like

Thanks for linking to that. I did read it already, though, and this is a description of the behavior, not a justification of it. In particular, I see comments like this one or this one, but not an ensuing discussion of why it was decided to convert rather than assert, which is the part I’m interested in.

The :: operator generally means “this will be of this type”. When does it convert? The simple answer is that we try to autoconvert whenever it’s sensible to do so. This rule gives Julia a lot of it’s “dynamic language” flavor. For example, the fact that you can make a Vector{String} array and then assign string values that aren’t exactly String (think SubString{String} or some other string encoding), but count on them getting automatically converted gives the language a much less fussy feel than static languages usually do.

Let’s evaluate each of the meanings with that rule in mind:

  1. “Assert that an expression isa particular type”: No, this would make type assertion the equivalent of calling convert, which is not what people want.
  2. “Declare the type of a local or global variable”: Yes, if someone assigns 123 to a variable declared to be UInt8 it makes sense to try converting it.
  3. “Declare the type of a struct field”: Yes, same thing.
  4. “Declare the type of a method’s return value”: Yes, we can try converting to the desired value if that’s not already the type of the returned object.
  5. “Declare the type of the positional arguments of a method”: No, implicit conversion this would conflict with multiple dispatch.
  6. “Declare the type of the keyword arguments of a method”: Actually, yes, since we don’t dispatch on keyword types, we could implicitly convert here but we don’t.

So the odd one out here is actually keyword arguments, which could implicitly convert but don’t. Perhaps we could change that—I don’t think it would be a breaking change since any code that currently works would continue to work as it did previously.

The other slightly annoying inconsistency is that when you declare a local or global variable as x::T, you can subsequently rely on the variable always referring to a value of type T. When a method is called with x::T in the signature, you know that x initially refers to a value of type T but if you can later reassign x to a value of any type. It would be kind of nice if an x::T declaration in the method signature also forced x to have type T throughout the method body. That would be a breaking change, however.

12 Likes

Worth addressing… I do get that writing return x::T feels like it should maybe convert for you. The reason it doesn’t is because x::T is just an expression which happens to be a type assertion. It would be possible to make return x::T a special syntax that means something different than doing y = x::T; return y but that seems bad. On the other hand the f(arg)::T = x syntax is not a valid syntax for anything else, so it was free to use to declare the return type of the function, so that’s what it does. My biggest gripe about this that it doesn’t mix with putting a where clause on the function signature.

4 Likes

Could we have a ::: then for people who want a non-converting declaration?

julia> struct A
       a::Int
       end

julia> A('a')
A(97)
julia> struct A
       a:::Int
       end

julia> A('a')
ERROR: What the hell are you doing?
1 Like

Good enough for me. My issue with this detail of the language arose because, well, the R-value and L-value behavior seemed like the obvious interpretation of the distinction, until I happened to notice that method returns are the odd one out.

I’ll drop it, at least until there’s official work on a 2.0 and maybe then as well. I just want to point out that this:

Is exactly why I don’t like autoconversion of return values: it’s the equivalent of calling convert, which is not what I want. I want it to behave like the rest of the signature. And this explanation of #0 cuts somewhat against “we try to autoconvert whenever it’s sensible to do so”, one could easily argue that var::Type is a sensible place to perform conversion, although I would not.

But as I said, it’s a good enough explanation for me, thanks for the elaboration. Whenever 2.0 comes around, it might be nice to do a survey of users to see which return-value behavior would be preferred, and I remain curious which choice would be the best one for the compiler, which is a separate question. I get as far as seeing that both tell the compiler what the value has to be, but can imagine this having different effects on inlining, for example.

1 Like

So with that we would need to distinguish whether each field is “converting” or “not converting”. People start having to worry when assigning to a field whether they need to call convert or not. Presumably that feature would then be desired for all the other places where you can declare types. So you would want to declare a local/global as x:::T now we need to keep track of whether each variable anywhere is converting or not. This one innocuous little feature now explodes the combinatorial space of what every piece of code everywhere in the language means.

The real issue with your example is that convert(Int, 'c') should not work, which is a mistake left over from when characters were more number-like in the pre-1.0 era. It’s fine if someone does Int('c') to convert, but convert needs to be much stricter and only allow conversions that are between different representations of basically the same value. Julia is mostly pretty good about this (e.g. convert(Int, ::Float64) errors if the value isn’t an integer), but some things have slipped through the cracks, like this char convert method.

3 Likes

I have no idea of course how easy or hard it would be to implement :::. But yes, struct fields were just an example. The idea would be for it to work for all sorts of places where assignment-esque things are happening.

Regarding “People start having to worry when assigning to a field whether they need to call convert or not”. Well, we’d use ::: when we want people to worry and :: when people should not.

::: would be nice. Until then, I have my workarounds.

That would be a nice opportunity to fix this bit of weirdness:

julia> UInt32('a')
0x00000061

julia> convert(UInt32, 'a')
0x00000061

julia> reinterpret(UInt32, 'a')
0x61000000

I ran into that at some point, the behavior was unexpected. I figured that since a Char is a UInt32 in disguise, convert would give me the same bytes back.

I think it’s better if reinterpret and convert work the same way with UInt32, and convert(Int, ::Char) was an error.

Edit: I feel like I have to add that this is, imho, another argument for why return values in signatures shouldn’t perform conversion. I’m glad I didn’t find that out by accidentally returning a Char from a function where I was expecting a ::UInt32, because neither possible conversion would be desirable, I would want an error.

Don’t reinterpret characters. How they are represented is none of your business :grin:

They are, in fact, not code point values. They are 1-4 bytes of UTF-8 code units, padded with zero bytes.

8 Likes

Don’t tell me how to live my life! :laughing:

I was using it to illustrate what I expected convert to do, UInt32(::Char) does what it should. Probably the best thing here is to outlaw convert from Char to numbers entirely.

At this point in my journey, I’ve read the Julia source code for Char construction, and eliminated all cases of string[i] from my parser in favor of codeunit(string, i). It’s a bit more than twice as fast as a result.

4 Likes

My point is this. reinterpret is asking what the literal bits of the representation of Char happen to be. We let you ask that, but having any expectation of it is on you—the fact that it’s not what you expect is probably a good thing so that people don’t rely on it. Asking to turn a character into an integer code point value, on the other hand, is a perfectly reasonable thing to do and makes sense for any integer destination type that can hold that value.

As I said before, this is really more of an argument for why the convert(Integer, ::Char) method shouldn’t exist. It does however produce the code point, which is the only reasonable thing for it to return if it’s going to exist. Returning the internal representation of the Char would not be fine.

That’s a good approach and is what @stevengj suggested early on—many string algorithms should actually work with code units rather than code points. Especially in UTF-8 which has these delightful properties:

  1. No character’s encoding is a subsequence of any other character’s encoding.
  2. Ordering by code unit and ordering by code point are the same.
1 Like

Note also that the mismatch between reinterpret(T, x) and convert(T, x) or T(x) also holds for many other types. For example:

julia> x = 1.0f0
1.0f0

julia> convert(UInt32, x)
0x00000001

julia> UInt32(x)
0x00000001

julia> reinterpret(UInt32, x) # bits of the IEEE Float32 representation
0x3f800000

For other readers, I should also note that you can alternatively do c = codeunits(string) and then use c[i] to work with the string’s codeunits as a byte array (without copying any data).

12 Likes

Making methods is easier than changing the parser to allow a whole new syntax that people have to learn and distinguish from the verrrrry similar ::.

julia> struct A a::Int end

julia> A('a') # we don't want this convert
A(97)

julia> methods(A) # we want A(a) to forward to A(a::Int) with no conversion
# 2 methods for type constructor:
 [1] A(a::Int64)
     @ REPL[45]:1
 [2] A(a)
     @ REPL[45]:1

julia> function A(a) A(a::Int) end
A

julia> A('a')
ERROR: TypeError: in typeassert, expected Int64, got a value of type Char
...

Incidentally, A(a::Int) seems to have been automatically generated because the call lowers to a smaller method body than A(a). new already converts its inputs to the fields’ types, but manually defining A(a) = new(a) to opt out of the automatic constructors then calling A(1) would lower to the same amount of conversion code as A('a'). A(a) = new(a::Int) wouldn’t get rid of that code either, it’d just add a preceding typeassert. The compiler still removes all the conversion steps for A(1) regardless.

This much we vigorously agree on.

This on the other hand, is what I expected would happen with convert. Not saying it should, just saying that’s what I thought would happen. The documentation says that Char is a specially-encoded UInt32, so I did, in fact, figure that it’s what convert would return.

What this conversation is illustrating is that Chars aren’t convertible to numbers in a logical way. Casting them explicitly with a constructor, I would expect the Unicode codepoint value, unsigned or not.

I certainly don’t expect convert and reinterpret to do the same thing for floats and ints, or signed and unsigned. I might be the only person on the planet who thought it would do the same thing for Char, for all I know.

It’s a good tip to have in ones pocket! I didn’t go that route personally, because nextind and prevind are optimal for what they do, and so it’s easier to keep the String as a String and just access the bytes with the function which does that. I’m sure the machine doesn’t know the difference.

1 Like

While I agree that convert(Integer, ::Char) is a problem, there are other aspects of :: declarations which I find inconsistent (though maybe others might not). :: declarations privilage primative types over user-defined types. If I have

struct A
    a
end

I can’t just go ahead and do

x::A = 1

I need to first define a Base.convert; something like:

Base.convert(::Type{A}, i) = A(i)

This isn’t unreasonable — UInt32(char) == codepoint is probably the most obvious representation of a Unicode Char, and I expect that it is the one used in most other languages.

It used to be the representation in Julia, too, but it was changed to the current representation in julia#24999 (also described in Changes to the representation of Char - #5 by StefanKarpinski), the result of an inspiration by @StefanKarpinski. The big benefit was the ability to represent invalid UTF-8 encodings, so that string iteration doesn’t break if there is random binary data mixed in. For example, collect(String(rand(UInt8, 100))) works. (It also made string iteration slightly faster, since you no longer have to fully decode the UTF-8 to get a Char.)

7 Likes