Upper boundary for dispatch in Julia

Hi fellows,
I have a function, which can return five different types according to some conditions. Julia seems to have a problem with type-inference. I created a small example with a similar problem. We create a random number and, dependent on its value, the returned object can have one of five data types.

image

Is there any way that I can convince Julia to keep inferring all five types? If not, are there any strategies to avoid this kind of behavior?

Thanks a lot in advance!

1 Like

Hi, that’s an interesting question. Could you paste the code in text format please?

1 Like

Yes, of course :slight_smile:

using Cthulhu

function randomreturntype()::Union{Float32,Float64,Int8,Int16,Int32}
    randomnumber = rand() * 5
    randomnumber < 1 && return Float32(1)
    randomnumber < 2 && return Float64(1)
    randomnumber < 3 && return Int8(1)
    randomnumber < 4 && return Int16(1)

    return Float32(1)
end

@descend randomreturntype()

This seems specifically designed to foil any kind of static analysis. What were you hoping that type inference would conclude about this?

1 Like

I could see one wanting to override the default union-splitting limits, with the result inferred as a concrete union rather than Any. But I’m not currently aware of any way to force that.

Only if you fork Julia and alter the compiler’s cutoff for small Union inferences. As you can tell, the compiler was perfectly capable of inferring the types at the return values, but it decided to infer the whole call’s return type as Any instead of a union of them. That’s just a pragmatic cutoff because at some point inferring type-unstable call signatures has more cost than benefit. The primary strategy is avoiding type-unstable methods in the first place, secondary strategy is rescuing inference in subsequent code by feeding the output into a dynamically dispatched call that specializes its method on the runtime type.

Something like Union{Float32, Float64,Int8,Int16}. Again, this is just a MVP which is close to an actual problem in another package

1 Like

I’m not sure that union would be much more helpful for further reasoning than Real or Any. Any analysis that has to consider all those possibilities is rapidly going to spiral there anyway.

5 Likes

Hi, @BBReactionHooper ! I think this is a perfect place for using SumTypes GitHub - MasonProtter/SumTypes.jl: An implementation of Sum types in Julia

7 Likes

This worked like a charm. Thank you very much!

I disagree. Actually, Julia can produce highly efficient code in some circumstances, even in the presence of large unions. For example, in the following code:

julia> struct Foo
           x::Union{Int, UInt, Int8, UInt8, Int16, UInt16, Int32, UInt32}
       end

julia> function foo(x::Foo)
           xx = x.x
           if xx isa Int
               return 0
           elseif xx isa UInt
               return 1
           elseif xx isa UInt8
               return 2
           elseif xx isa UInt16
               return 3
           elseif xx isa UInt32
               return 4
           elseif xx isa Int8
               return 5
           elseif xx isa Int16
               return 6
           elseif xx isa Int32
               return 7
           end
       end
foo (generic function with 1 method)

julia> foo(Foo(1))

Here, Julia is able to correctly infer that xx is a large union, and to eliminate each of the union’s members one by one in the if statement. It correctly infers a return type of Int (as opposed to Union{Int, Nothing}, indicating it statically knows all the possible types are covered), and it compiles the function to use a lookup table to get the type.

The problem is that Julia’s semantics don’t provide any kind of mechanism to handle unions effectively. All we can do now is hope that Julia is able to do union-splitting, but the precise circumstances where this occurs is subject to change (and effectively unknowable from the outside). In my example above, for example, there are way more types than the MAX_TYPEUNION_* parameters in Julia, yet it still works well.

In contrast, if Julia provided, say, some kind of match statement which could be applied to a Union-typed struct field, it would be able to give guarantees that no dynamic dispatch happened. Note that this is a much simpler thing than to handle arbitrary large unions in inference - what I demonstrate here (and what OP asks for) is essentially just a fancy if-statement applied to unions.

Of course one could argue that we already have unions and if-statements, so what’s the problem? Well:

  • Even though it works right now, it’s hard to know whether it’s supposed to be efficient, or whether Julia can just give up and emit bad code
  • It would be nice to have a statement that statically checked all the union variants were covered. That would only work with fields from structs, though, so unless we have something like SumTypes.jl, it’ll be of limited utility
1 Like

Does the compiler actually take the inferred type of x.x into account there? It could just look at all the return statements and infer that all the literals are Int. If you substitute one of the return values with xx or a Float64 literal, the overall return type is inferred (according to code_warntype) as a small yellow-highlighted Union including the type of xx for that branch or Float64. If you substitute with x.x instead, you get a large red-highlighted Union matching the field annotation; if you additionally replace another return statement with a Float64 value, the return type is inferred as Any. This is consistent with tallying the inferred types of all return statements.

Some compiler implementations can still be reliable even though the language semantics don’t address it. The C standard never mentions the stack or heap, but we’re pretty sure that’s what will happen for automatic and dynamic memory. Since base Julia methods were rewritten in v0.7 and v1.0 to possibly return nothing, I think we can at least count on 2 types to be handled by union-splitting, or at least something that dodges runtime dispatch.

For reference (largely my own future reference):
Max union splitting is one of the fields of Core.Compiler.InferenceParams

help?> Core.Compiler.InferenceParams
  │ Warning
  │
  │  The following bindings may be internal; 
...
  inf_params::InferenceParams

  Parameters that control abstract interpretation-based type inference operation.

    ──────────────────────────────────────────────────────────────────────────────────────────────────

    •  inf_params.max_union_splitting::Int = 4
       Specifies the maximum number of union-tuples to swap or expand before computing the set
       of matching methods or conditional types.

And InferenceParams is one of the fields of the Core.Compiler.NativeInterpreter
So if you happen to be running type-inference yourself you can set that on your interpretter,
and get it to union split as much as you like. (At the cost of all the downsides of union splitting, and with its all-in-all limited utility).
Of-course right now almost noone is running type-inference themselves because that’s awful.
But I am hoping on teaching people how this juliacon, because it is my solemn duty to increase the amount of evil in the world.

3 Likes

Does the compiler actually take the inferred type of x.x into account there?

Yes. You can see this because the last, omitted return type, namely the nothing if neither of the branches gets taken, is optimised away. This happens because it knows that every possible of its types are covered by the if statements. For example, you can try add more if statements like elseif xx isa UInt128 with a new return type - this return type will be ignored because Julia statically knows it will never be taken.

The reason it doesn’t work when substituting x.x in is simply that Julia doesn’t understand that loading the .x field twice results in the same value. That’s also why I load it into a variable xx first.

1 Like

I mean, SumTypes literally does exist…

function randomreturntype()
    x = rand() * 6
    if x < 1
        Int(1)
    elseif x < 2
        UInt(2)
    elseif x < 3
        Int8(3)
    elseif x < 4
        UInt8(4)
    elseif x < 5
        Int16(5)
    else
        UInt16(6)
    end
end
using SumTypes
@sum_type Foo begin
    A(::Int)
    B(::UInt)
    C(::Int8)
    D(::UInt8)
    E(::Int16)
    F(::UInt16)
end
Foo(x::Int) = A(x)
Foo(x::UInt) = B(x)
Foo(x::Int8) = C(x)
Foo(x::UInt8) = D(x)
Foo(x::Int16) = E(x)
Foo(x::UInt16) = F(x)

function bar()
    x = randomreturntype() # Type unstable on its own
    f = Foo(x)             # Type unstable given the argument type isn't known
    @cases f begin
        A => 0
        B => 1
        C => 2
        D => 3
        E => 4
        F => 5
    end                    # Type stability recovered 
end

and then voila, we have type stability:

julia> Core.Compiler.return_type(bar, Tuple{})
Int64
4 Likes

I liked the comparison made by Union vs sum types (viralinstruction.com). Usability-wise, I’d prefer small inferred Unions because their not being instantiable concrete types allows for more backwards-compatible changes and type inference optimizations, and this is inherently limited not only by the small max_union_splitting cutoff but also the ease of combinatorial explosion (foo(::Union{A,B,C}, ::Union{D,E})::Any) and situational expectations of single return types. I’d almost always design a single return type or also return nothing/missing that don’t allow for combinatorial explosion, and I’d collapse small Unions ASAP in branches, convert, promoting operations, etc. On the other hand, sum types and their explicit handling are appropriate and smoother for cases like this where having a value of many possible types is the feature, not a brief exercise in type stability.