Semantics of :: in return type vs. argument type annotations

mnemnion · February 18, 2024, 3:32pm

Begging your pardon, but you didn’t identify why it would be invalid to do what I said it would be valid for the compiler to do. Without that, it’s hard to see what you’re disagreeing with.

I’m reasonably confident that if the assertion is of a concrete type, and applied to all return values of the subject (this being the proposed semantic), that one of three situations obtain.

One: The assertion matches return_types exactly. In that case, the function is type stable, and also, the assertion does not need to be compiled into it, compilers are allowed to elide code which is known to be redundant.

Two: The intersection of return_types and the assertion is Union{}. This means that calling the function can only ever throw an error, and it would be valid for the compiler to inform the user of this with an error. Valid, mind you. I could be convinced that it would make compilation slow, but I’d need to hear the argument for that. It doesn’t matter if this code is type unstable or not, it’s almost an academic question to ask about a function which must throw an error.

Three: return_types is a Union containing the assertion, or an abstract supertype of the assertion. This code is probably type unstable, and is either sometimes invalid, or it’s correct but the compiler can’t infer down to the type of the assertion. The compiler could detect some cases where the assertion can definitely fail, an example being when return_types is a Union with the assert type as one member, and can warn about this, but since it doesn’t know that actual uses of the function will throw errors, it must compile. In other cases this be an inference failure, meaning it’s possible that the code is completely correct but return_types is Any, for one example. Here it shouldn’t warn either, just compile in the type assertion and proceed. Other tools can bring this to the user’s attention, as happens now.

We can contrast this with the current converting declaration, where the compiler can for the most part not detect an invalid combination. If return_types is just UInt8, and the annotation is ::Foo, it can’t know if some later code might define convert(UInt8, ::Foo), so it has to just add the call to convert and carry on accordingly.

Your discussion of what an independent static analyzer could provide is fascinating, and I agree it would be a good idea. I’m sure there are more expensive operations, like a bidirectional type synthesis, which it would be inappropriate to add to the Julia compiler itself, but which would help write correct code. I’d use it.

My unchecked assumption is that the compiler already uses return_types on all methods, so it can use that information to generate good code at call sites, in which case, throwing an error when it detects an invalid assertion would be free, practically speaking, just a single added conditional. That could be completely wrong! It’s not worth making all compilation noticeably slower just to throw an error in situation two, but I do believe it would be valid to do so, if we accept the premise that the compiler is allowed to reject code which is known to be erroneous.

Sukera · February 18, 2024, 3:58pm

I don’t think there are any cases where the compiler currently rejects any such cases, at least not in the sense that you’d get an actual error from the compiler, instead of having the compiler insert a throw and a subsequent runtime error.

I still think it’s allowed to do that, it’s just been a consistent design choice not to do that. I’ve heard as justification that Julia is semantically dynamic, and thus such errors should be semantically thrown at runtime too. I’d personally love to see the compiler reject these kinds of cases known-Union{}-returntype though, and keep the dynamic part restricted to dynamic dispatches.

mnemnion · February 18, 2024, 4:29pm

So long as Julia lacks a standard, there’s no source-of-truth as to what the compiler is or is not allowed to do. We might hope that as the language matures, a standard emerges, providing a forum to debate and settle these sorts of philosophical questions.

Julia is dynamic, which I like, and gradually typed, which I also like. It’s a powerful combination. One of the available powers is rejecting code which is known to be erroneous, and I see no advantage to not doing so, provided that the logic to reject that code is O(1) with a small constant factor. To my taste, the ideal language is one with a powerful type system, where the default type is the top type, so that one can write clean generic code without annotations, and then firm up the typing of the code when it becomes clear what that should be. Julia comes very close to that ideal, this is a large part of why I’ve come to prefer it.

For the case we’re considering, it doesn’t really matter whether the compiler rejects situation two at load time. If the function is never called, who cares if it’s invalid, or type unstable for that matter. If it’s ever called, it will throw an error. What’s more important is that the assertion itself may be elided if the function is correct, meaning that you get the error-catching properties of a type assertion without paying any runtime penalty. In a perfect world, type declarations only make code slower when they have to, examples being conversion of types which cannot be proven to be the same during compilation. Which, to bang the drum again, is part of why I’d like return declarations to be asserting, not converting. I’ll note that the performance tips include annotating untyped values, suggesting that declarations can lead to faster code. My supposition is that asserting return types would assist with the function barrier technique as well.

If I get some free time in the near future, I intend to satisfy my curiosity about whether the compiler will elide type assertions when it knows that they always pass. I have the vague impression from other discourse threads that it does.

mrufsvold · February 18, 2024, 5:06pm

Maybe a silly question, but would it be helpful for those who want to have return assertions in the function signature to have something like:

# warning: untested code...
struct Asserted{T} end
convert(::Asserted{T}, x) where T = x::T
macro returns(sym)
    return :(::Asserted{$sym})
end

function f(x) @returns(Int64)
...
end

mnemnion · February 18, 2024, 6:07pm

It would depend on what the compiler is able to do with that code. I did a bit of investigative code lowering when I was working on RecursiveDicts.jl, and it looked like the compiler is pretty good at eliding access to a struct with one field. I don’t know if it would elide through both the constructor and convert though.

The topic here is more about whether or not a change in the meaning of return declarations would be a good candidate for 2.0. It isn’t difficult to work with the current semantics, the problem is that since I don’t want those semantics, the solution is to not use that language feature unless I happen to want an implicit call to convert.

I’ve started changing my return annotations into comments, like this:

# before 
function a_fn(a::T, b::U)::T

# after
function a_fn(a::T, b::U)  # :: T

Which covers one of the reasons I was using them, which is to be able to read the return type of the function off the signature. Sometimes I will also annotate the return value itself, sometimes not.

Sometimes the return annotation is one of my concrete type in a pathway which isn’t performance-sensitive, and I just leave those ones alone, since they will raise the error I want unless I’ve defined a convert method for that type. In which case I wouldn’t actually mind the converting semantics.

Benny · February 18, 2024, 9:44pm

I don’t think it can. Methods intended to only throw errors exist, and the compiler can’t distinguish these from methods that aren’t intended to throw errors. At some level, we have to separately specify what this method should do, like never throwing an error (specifying particular errors is important because of some erroring operations like integer division). This makes inserting a typeassert into the method an inappropriate approach because it’s intended to throw runtime TypeErrors, and case One where the compiler elides the error is something we can test for in other ways. This also wouldn’t address Case Three where the method can return, and in my opinion a method that only works as intended sometimes is just as important to catch as a method that never works.

Again, the overhead isn’t the most objectionable aspect, it’s the conditional and runtime aspects. Your issue with the convert-typeassert behavior was that for some large inputs, the convert fails with an InexactError. So, you’d prefer if it didn’t convert and every call failed, assuming case Two. If case Three, then not every call fails. In either case, you also need to execute the method with real values to throw the error; what if your method spends 5 hours crunching numbers before it gets there? That’s terrible for static type analysis. Since a typeassert call works at runtime, it’s inappropriate for static type analysis no matter what the compiler does. mkitti’s example test using return_types is closer, but it’s not as thorough as languages where every variable is annotated so the compiler can tell when the right hand expression’s type doesn’t match. We write much more generic methods so we’re happy if variables are type-stable given the argument types, and that’s something that’s nice to catch in static analysis (JET.jl).

Changing the subject a little back to the convert behavior, it occurs to me that it might have been inspired by the weak typing behavior of C/C++ where silent conversions happen when a value is assigned to a variable annotated with a different type. Nonintegral floating points are even silently truncated to integers, unlike in Julia where an InexactError is thrown. This has to be avoided with conscious type design, like myappend!(A::Vector{T}, new::T) where T = append!(A, new). Very few have a problem when implicit conversion is widening instead (promotions, integer to float, smaller to larger integer), especially when assigning to fields a.x = 1 or elements A[1] = 1. The convert behavior of return type annotations seems to have more objections because we don’t have access to a method and its return type like we do with instances. A simple typeof(a.x) or eltype(A) gets to the type, whereas we are at the mercy of dispatch to reach that method even if we invented a simple way to get its return type. The return type annotation does not replace type stability because convert can fail; a convert is more reliable after the function call rather than incorporated into one of the methods; its existence is so questionable the documentation points out how rarely it’s used.

mnemnion · February 18, 2024, 10:20pm

At least in the simplest case, the answer is yes. The assertion survives the lowering process but the assembly is identical.

julia> function test1(a::UInt8)
          b = a + 0x01
          return b::UInt8
       end
test1 (generic function with 1 method)

julia> function test2(a::UInt8)
          b = a + 0x01
          return b
       end
test2 (generic function with 1 method)

julia> Base.return_types(test1)
1-element Vector{Any}:
 UInt8

julia> Base.return_types(test2)
1-element Vector{Any}:
 UInt8

julia> @code_lowered test1(0x01)
CodeInfo(
1 ─      b = a + 0x01
│   %2 = Core.typeassert(b, Main.UInt8)
└──      return %2
)

julia> @code_lowered test2(0x02)
CodeInfo(
1 ─     b = a + 0x01
└──     return b
)

julia> @code_native test1(0x01)
        .section        __TEXT,__text,regular,pure_instructions
        .build_version macos, 12, 0
        .globl  _julia_test1_2039               ; -- Begin function julia_test1_2039
        .p2align        2
_julia_test1_2039:                      ; @julia_test1_2039
; ┌ @ REPL[10]:1 within `test1`
; %bb.0:                                ; %top
; │ @ REPL[10]:2 within `test1`
; │┌ @ int.jl:87 within `+`
        add     w0, w0, #1
; │└
; │ @ REPL[10]:3 within `test1`
        ret
; └
                                        ; -- End function
.subsections_via_symbols

julia> @code_native test2(0x01)
        .section        __TEXT,__text,regular,pure_instructions
        .build_version macos, 12, 0
        .globl  _julia_test2_2068               ; -- Begin function julia_test2_2068
        .p2align        2
_julia_test2_2068:                      ; @julia_test2_2068
; ┌ @ REPL[9]:1 within `test2`
; %bb.0:                                ; %top
; │ @ REPL[9]:2 within `test2`
; │┌ @ int.jl:87 within `+`
        add     w0, w0, #1
; │└
; │ @ REPL[9]:3 within `test2`
        ret
; └
                                        ; -- End function
.subsections_via_symbols

Obvious next question: what about converting assertions?

julia> function test3(a::UInt8)::UInt8
          b = a + 0x01
          return b
       end
test3 (generic function with 1 method)

julia> @code_lowered test3(0x03)
CodeInfo(
1 ─ %1 = Main.UInt8
│        b = a + 0x01
│        @_4 = b
│   %4 = @_4 isa %1
└──      goto #3 if not %4
2 ─      goto #4
3 ─ %7 = Base.convert(%1, @_4)
└──      @_4 = Core.typeassert(%7, %1)
4 ┄      return @_4
)

julia> @code_llvm test3(0x03)
;  @ REPL[18]:1 within `test3`
define i8 @julia_test3_2128(i8 zeroext %0) #0 {
top:
;  @ REPL[18]:2 within `test3`
; ┌ @ int.jl:87 within `+`
   %1 = add i8 %0, 1
; └
;  @ REPL[18]:3 within `test3`
  ret i8 %1
}

julia> @code_native test3(0x03)
        .section        __TEXT,__text,regular,pure_instructions
        .build_version macos, 12, 0
        .globl  _julia_test3_2074               ; -- Begin function julia_test3_2074
        .p2align        2
_julia_test3_2074:                      ; @julia_test3_2074
; ┌ @ REPL[18]:1 within `test3`
; %bb.0:                                ; %top
; │ @ REPL[18]:2 within `test3`
; │┌ @ int.jl:87 within `+`
        add     w0, w0, #1
; │└
; │ @ REPL[18]:3 within `test3`
        ret
; └
                                        ; -- End function
.subsections_via_symbols

Not so bad! And informative as well. The lowered code has a preliminary typecheck which skips convert if they’re the same type, and the codegen process can determine that will always be the case and produce the identical and minimal assembly. I’m sure there are more complex cases where this would break down, but it supports the idea that type declarations have no runtime impact on type-stable code. I added the LLVM this time because, while I assumed that the LLVM-assembly pipeline is simple translation, it doesn’t hurt to check ones assumptions.

I’ll probably try it out on a few real functions from my own code, I’ll spare the rest of you the REPL dumps though. This suggests I can leave in return declarations of concrete structs without worrying so much about paying a runtime penalty for it, and still get the visible-signature and regression-guarding properties I want.

And since the codegen process has to be using the information in return_type to perform this optimization, that suggests that it could cheaply throw an error during compiling if the code is proven to be impossible. Although this doesn’t answer whether it’s something in Julia proper, or LLVM, which is doing the elision here. It could still be expensive for the compiler to detect this condition, and I don’t have a good way of determining that without spending a whole lot of time deep in Julia internals, so this is probably as far as I’m going to get.

What does it do with impossible code now? Let’s find out!

julia> function test4(a::String)
         b = a * "one"
         return b::UInt8
       end
test4 (generic function with 1 method)

julia> Base.return_types(test4)
1-element Vector{Any}:
 Union{}

julia> @code_lowered test4("four")
CodeInfo(
1 ─      b = a * "one"
│   %2 = Core.typeassert(b, Main.UInt8)
└──      return %2
)

julia> @code_llvm test4("four")
;  @ REPL[1]:1 within `test4`
; Function Attrs: noreturn
define void @julia_test4_513({}* noundef nonnull %0) #0 {
top:
  %1 = alloca [2 x {}*], align 8
  %.sub = getelementptr inbounds [2 x {}*], [2 x {}*]* %1, i64 0, i64 0
;  @ REPL[1]:2 within `test4`
; ┌ @ strings/basic.jl:260 within `*`
; │┌ @ strings/substring.jl:225 within `string`
    store {}* %0, {}** %.sub, align 8
    %2 = getelementptr inbounds [2 x {}*], [2 x {}*]* %1, i64 0, i64 1
    store {}* inttoptr (i64 4460659096 to {}*), {}** %2, align 8
    %3 = call nonnull {}* @j1__string_515({}* inttoptr (i64 4729327200 to {}*), {}** nonnull %.sub, i32 2)
; └└
;  @ REPL[1]:3 within `test4`
  call void @ijl_type_error(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 4794046384 to {}*), {}* %3)
  unreachable
}

I went with just LLVM this time because it’s a lot easier to read.

TL;DR it does the concatenation, performs a typeassert, and throws an error.

We already knew it wouldn’t be a compile-time error, I was just curious if it would skip the typeassert and go straight to the error. I was confident it would generate the intermediate code, it would have to, there could be print statements or other side effects before it hits the error.

Benny · February 18, 2024, 11:08pm

Exactly right, the compiler is pretty smart about what needs or doesn’t need to happen at runtime. That intermediate code before the inevitable typeassert error can have side effects (most obvious use is logging runtime errors) which 1) can be intended program behavior, or 2) isn’t something you should wait for if you just want to catch unintended types. That’s what static type analysis is for. Macros such as the @code_ ones often transform function calls, but the call doesn’t actually happen, and the input instances are just processed to their types.

Mason · February 18, 2024, 11:48pm

mrufsvold:

Maybe a silly question, but would it be helpful for those who want to have return assertions in the function signature to have something like:
# warning: untested code...
struct Asserted{T} end
convert(::Asserted{T}, x) where T = x::T
macro returns(sym)
    return :(::Asserted{$sym})
end

function f(x) @returns(Int64)
...
end

That won’t work because return type markers in julia are not just calls to convert, they‘re also type assertions. So if the convert(T, x) call does not give something of type T, it’ll throw an error.

Benny · February 19, 2024, 12:01am

That looks like the intent of the Asserted type, to make the convert-typeassert behavior do typeassert-typeassert. Bit redundant but that can be changed to identity-typeassert by replacing x::T with x; this is typical of convert methods where the instance already has the target type. It’s a simpler macro than moving ::sym from the return type to all return statements and the last expression, but ::Asserted{T} is good enough for me.

mnemnion · February 19, 2024, 12:04am

That’s good to know, because convert itself supports perversity such as the following:

julia> struct TheStruct a::Int end

julia> Base.convert(Int, a_struct::TheStruct) = "a string"

julia> convert(Int, TheStruct(42))
"a string"

That the semantics of return type declarations are convert-then-assert, rather than convert-YOLO, is of some comfort, although you won’t get this behavior from convert if you don’t write it.

I know convert is just an ordinary method, so it can do whatever it wants, and I don’t have strong opinions about whether it should stay that way. It falls into the category of doing-weird-things, which I don’t see much purpose in preventing, generally.

Mason · February 19, 2024, 8:01am

That’s the intended behaviour yes, but it won’t work because the way they tried to do it is with ::Asserted{T}, which means the function must return an object of type Asserted{T}, not an object of type T.

Mason · February 19, 2024, 8:49am

I think a better way to do this @returns T thing would be to write a macro that turns

@returns T function f(x::X, y) where {X}
    z = g(x, y)
    w = h(X, z)
end

into

function f(x::X, y) where {X}
     f_inner(x, y) :: T # <---- Regular assertion, won't do a convert(T, _) step
end
function f_inner(x::X, y) where {X}
    z = g(x, y)
    w = h(X, z)
end

mkitti · February 19, 2024, 9:39am

That macro was already implemented above, at least in part:

Benny · February 19, 2024, 10:02am

That looks like it should work smoothly in the global scope, could try an @inline f_inner(x, y)::T to make it closer? The issue with splitting into 2 functions comes in a local scope where they are closures. The inner one should be defined first so that capturing it can be type-stable. It should also have a unique generated name so it doesn’t make 1 function with multiple methods when the outer function has multiple methods; this hurts the type stability of capturing.

Are there more exit points besides the explicit returns and the last expression in the function body? I would say the difficulty is treating nested functions separately, whether to exclude them from assertions or giving them their own assertions.

ffevotte · February 19, 2024, 10:25am

Aren’t all those points already taken care of in the implementation suggested above?

julia> @macroexpand function foo(x)
           @returnassert function inner(x) :: Float64
               2x
           end
       
           inner(x) + 1
       end
:(function foo(x)
      #= REPL[5]:1 =#
      #= REPL[5]:2 =#
      begin
          #= REPL[4]:22 =#
          function var"##inner#225"(x; )
              #= REPL[5]:2 =#
              #= REPL[5]:3 =#
              2x
          end
          #= REPL[4]:23 =#
          function inner(x; )
              #= REPL[4]:18 =#
              var"##inner#225"(x)::Float64
          end
      end
      #= REPL[5]:6 =#
      inner(x) + 1
  end)

I’m not sure I understand this part. Don’t you think the compiler would inline calls anyway? And doesn’t the gensymed name take care of hygiene in the global scope? In any case it would be very straightforward to annotate the inner function call with @inline if it is useful.

That’s a very good point. Indeed I think all exit points fall into one of those two categories. That being said, a macro trying to annotate return statements would have to take extra steps to distinguish between return statements in the outermost function, and return statements in nested functions or closures. Is that what the second part of your comment refers to?

Benny · February 19, 2024, 10:34am

I was responding to Mason in that part of my comment, I could’ve clarified by quoting or something.

Often yes, but I think large enough methods aren’t, hence the extra hint to do it in this case because it used to be 1 method anyway.

Exactly.

uniment · March 1, 2024, 12:47am

It’s awkward, but this works:

(f(x::X)::Int) where X<:Real = x+1

# equiv to:

function f(x::X)::Int where X<:Real; x+1 end

Topic		Replies	Views
Do you put return type in function definitions? General Usage	16	9615	October 30, 2024
Type-annotate vs convert General Usage	6	1211	June 23, 2017
Return Type Declarations what are the semantics of f(x)::Any? Internals & Design type	3	779	July 1, 2017
Return type annotation with "where" General Usage	3	1080	May 28, 2017
Should Julia raise errors for illegal return type annotation? Internals & Design	6	2622	September 4, 2021

Semantics of :: in return type vs. argument type annotations

Related topics