Extending Base.in type-stably


#1

(My original question seemed a bit convoluted, so I changed it significantly to make it succinct, but the basic questions are the same.)

I’m experiencing a difficulty in extending a Base function type-stably. Not sure if the same difficulty arises for any Base functions, but the one I’m having a difficulty with is Base.in.

Consider the following code where I extend Base.in for four concrete types of the same abstract type MyAbs:

abstract type MyAbs end

struct MyType1 <: MyAbs end
struct MyType2 <: MyAbs end
struct MyType3 <: MyAbs end
struct MyType4 <: MyAbs end

Base.in(x::Int, ::MyType1) = x==1
Base.in(x::Int, ::MyType2) = x==2
Base.in(x::Int, ::MyType3) = x==3
Base.in(x::Int, ::MyType4) = x==4

Now, I would like to call in with Int and MyAbs using the function below, which takes a vector of MyAbs, iterates over the vector, and executes in for each entry of the vector.

function zero_in(mvec::Vector{MyAbs})
    # Return true if any element of mvec contains 0
    t = false
    for m = mvec
        t = in(0, m)
        t && break
    end
    return t
end

The output of this function turns out to be type-unstable:

julia> VERSION
v"0.6.1-pre.0"

julia> mvec = [MyType1(), MyType2(), MyType3(), MyType4()];

julia> @code_warntype zero_in(mvec)
Variables:
  #self#::#zero_in
  mvec::Array{MyAbs,1}
  m::MyAbs
  #temp#::Int64
  t::Any

Body:
  begin
      t::Any = false # line 3:
      #temp#::Int64 = 1
      4:
      unless (Base.not_int)((#temp#::Int64 === (Base.add_int)((Base.arraylen)(mvec::Array{MyAbs,1})::Int64, 1)::Int64)::Bool)::Bool goto 17
      SSAValue(2) = (Base.arrayref)(mvec::Array{MyAbs,1}, #temp#::Int64)::MyAbs
      SSAValue(3) = (Base.add_int)(#temp#::Int64, 1)::Int64
      m::MyAbs = SSAValue(2)
      #temp#::Int64 = SSAValue(3) # line 4:
      t::Any = (0 in m::MyAbs)::Any # line 5:
      unless t::Any goto 15
      goto 17
      15:
      goto 4
      17:  # line 7:
      return t::Any
  end::Any

On the other hand, let’s repeat the same experiment with a non-Base function. Here I define myfun that does exactly the same thing as Base.in above, and also zero_myfun that does exactly the same thing as zero_in above:

myfun(x::Int, ::MyType1) = x==1
myfun(x::Int, ::MyType2) = x==2
myfun(x::Int, ::MyType3) = x==3
myfun(x::Int, ::MyType4) = x==4

function zero_myfun(mvec::Vector{MyAbs})
    t = false
    for m = mvec
        t = myfun(0, m)
        t && break
    end
    return t
end

Now, unlike the previous experiment, zero_myfun produces a type-stable output:

julia> @code_warntype zero_myfun(mvec)
Variables:
  #self#::#zero_myfun
  mvec::Array{MyAbs,1}
  m::MyAbs
  #temp#::Int64
  t::Bool

Body:
  begin
      t::Bool = false # line 3:
      #temp#::Int64 = 1
      4:
      unless (Base.not_int)((#temp#::Int64 === (Base.add_int)((Base.arraylen)(mvec::Array{MyAbs,1})::Int64, 1)::Int64)::Bool)::Bool goto 17
      SSAValue(2) = (Base.arrayref)(mvec::Array{MyAbs,1}, #temp#::Int64)::MyAbs
      SSAValue(3) = (Base.add_int)(#temp#::Int64, 1)::Int64
      m::MyAbs = SSAValue(2)
      #temp#::Int64 = SSAValue(3) # line 4:
      t::Bool = (Main.myfun)(0, m::MyAbs)::Bool # line 5:
      unless t::Bool goto 15
      goto 17
      15:
      goto 4
      17:  # line 7:
      return t::Bool
  end::Bool

Why do these two experiments give different behavior? More importantly, how can I extend Base.in to produce a stable output in the first experiment?


#2

A Vector of an abstract tyoe is inherently type unstable, in the sense that julia does not know which (concrete) type its elements have.


#3

Elements of a Vector being abstract is not a problem here. You can easily create a function that returns a stable output while taking an abstract input. For example,

julia> absinput(x::Any) = 1
absinput (generic function with 1 method)

julia> code_warntype(absinput, (Any,))
Variables:
  #self#::#absinput
  x::Any

Body:
  begin
      return 1
  end::Int64

Also, my second test case above with myfun shows that zero_myfun taking a Vector of an abstract type produces a stable output. My main question is why almost the same code becomes unstable when myfun is replaced with Base.in.


#4

First, notice when you reduce the number of MyTypes to 3, the type of t in zero_in is inferred and type stability restored.

This shows it’s the complexity of type inference which seems to be a problem. At the top of base/inference.jl there is const MAX_TYPEUNION_LEN = 3. And later on this determines if to giveup on big Unions and infer an Any type. IMO this is what’s going on here.

Union{MyType1,MyType2,MyType3,MyType4} is inferred as Any and the return_type inferred for Any for Base.in is not Bool (some methods of Base.in return other types).

In the case of myfun, there are only methods which return Bool, and the inference engine can determine a Bool result for Any too. This can be seen by looking at the output of:

Base.return_types(Base.in)
Base.return_types(myfun)

To recreate the instability of Base.in for myfun, another myfun method can be added:

myfun(x::Int, y) = 5

One other thing: If a Union{MyType1,MyType2,MyType3,MyType4} is used in the definition of mvec and the parameter type of zero_in instead of MyAbs, then the inference engine seems to work.

Hope this explains the uneven behavior, even though I don’t see any clean workaround. Maybe someone else will.


#5

A simple and not too ugly way to restore stability is to change zero_in to:

function zero_in(mvec::Vector{MyAbs})
    # Return true if any element of mvec contains 0
    t = false
    for m = mvec
        t = in(0, m)::Bool
        t && break
    end
    return t
end

With this definition:

julia> @code_warntype zero_in(mvec)
Variables:
  #self#::#zero_in
  mvec::Array{MyAbs,1}
  m::MyAbs
  #temp#::Int64
  t::Bool

Body:
  begin
      t::Bool = false
      #= line 4 =#
      #temp#::Int64 = 1
      4: 
      unless (Base.not_int)((#temp#::Int64 === (Base.add_int)((Base.arraylen)(mvec::Array{MyAbs,1})::Int64, 1)::Int64)::Bool)::Bool goto 17
      SSAValue(2) = $(Expr(:invoke, MethodInstance for getindex(::Array{MyAbs,1}, ::Int64), :(Base.getindex), :(mvec), :(#temp#)))::MyAbs
      SSAValue(3) = (Base.add_int)(#temp#::Int64, 1)::Int64
      m::MyAbs = SSAValue(2)
      #temp#::Int64 = SSAValue(3)
      #= line 5 =#
      t::Bool = (Core.typeassert)((0 in m::MyAbs)::Any, Main.Bool)::Bool
      #= line 6 =#
      unless t::Bool goto 15
      goto 17
      15: 
      goto 4
      17: 
      #= line 8 =#
      return t::Bool
  end::Bool

#6

I’m surprised that some in methods implemented in Julia’s Base module return unstable output. Shouldn’t they be written to return Bool by all means, e.g., by using return type annotation?


#7

Yeah, it is surprising. So, I tried to track down one of the non-Bool return values of Base.in and got to the method for Base.in(Float64,Dict).
This generates an error, as it should, since Dicts contain only Pairs, but the function does not return any value after raising the error, which leaves the inference engine with a nothing return. Probably should have a return false dummy statement after raising the error.

It might be appropriate to open an issue for this.


#8

No it doesn’t. It returns an error and does not affect inference.


#9

I figured that if a function returns an error, its return type is Union{}, which is a subtype of any type. Therefore Union{Union{},Bool} is Bool. This means that if some function has methods returning Union{} and Bool, the compiler infers the return type correctly as Bool. I think this is what @yuyichao meant. I checked this is indeed what happens.

So, in methods returning Union{} is not a problem. Problematic ones are in methods returning Any. I think these ones can be made to return Bool by return type annotation at least, or by type asserting local variables correctly.


#10

Right, @code_warntype handles the error raised correctly and infers a Union{} type.
In another case: Base.in('a','a':'z') Julia0.7 correctly infers a Bool return value with @code_warntype in('a','a':'z'), but

Base.return_types(Base.in,(Char,Range{Char}))

returns Any.
Possibly it is this Any (and other Any return_types) which cause the problem @wsshin described, but I would certainly like to know.


#11

The reason @code_warntype in('a','a':'z') indicates a stable output while Base.return_types(Base.in,(Char,Range{Char})) doesn’t is that Range{Char} is an abstract type. The type of 'a':'z' is StepRange{Char,Int64}, which is a subtype of Range{Char}, and

Base.return_types(in, (Char,StepRange{Char,Int64}))

returns Bool.


#12

So I tried to remove return type instability of all in methods of Base module, but the problem still persists.

Specifically, I tracked down all the in methods returning Any in the Base module, and annotated their return types wth ::Bool. Then I could see that Base.in always returns either Union{} or Bool:

julia> VERSION
v"0.7.0-DEV.1283"

julia> Base.return_types(Base.in)
28-element Array{Any,1}:
 Bool
 Bool
 Bool
 Bool
 Bool
 Bool
 Bool
 Bool
 Bool
 Bool
 Bool
 Bool
 Bool
 Bool
 Bool
 Union{}
 Bool
 Bool
 Bool
 Bool
 Bool
 Bool
 Bool
 Union{}
 Bool
 Bool
 Bool
 Bool

Then, I repeated the original experiment with MyAbs and MyTypes, hoping that now type stability of the extended Base.in would be recovered. Unfortunately it didn’t:

julia> @code_warntype zero_in(mvec)
Variables:
  #self#::#zero_in
  mvec::Array{MyAbs,1}
  t::Any
  #temp#::Int64
  m::MyAbs

Body:
  begin
      t::Any = false
      #= line 4 =#
      #temp#::Int64 = 1
      4:
      unless (Base.not_int)((#temp#::Int64 === (Base.add_int)((Base.arraylen)(mvec::Array{MyAbs,1})::Int64, 1)::Int64)::Bool)::Bool goto 17
      SSAValue(2) = $(Expr(:invoke, MethodInstance for getindex(::Array{MyAbs,1}, ::Int64), :(Base.getindex), :(mvec), :(#temp#)))::MyAbs
      SSAValue(3) = (Base.add_int)(#temp#::Int64, 1)::Int64
      m::MyAbs = SSAValue(2)
      #temp#::Int64 = SSAValue(3)
      #= line 5 =#
      t::Any = (0 in m::MyAbs)::Any
      #= line 6 =#
      unless t::Any goto 15
      goto 17
      15:
      goto 4
      17:
      #= line 8 =#
      return t::Any
  end::Any

So I guess our analysis was wrong. There is something else causing this instability.


#13

I’ll repeat what I said in the github issue. Getting type stability here won’t help you here.


#14

@yuyicho, I’m not trying to reduce the dispatch cost as you suspected in that GitHub issue. I’m just trying to achieve type stability of Base.in that I extended.

The reason I want type stability is that I am writing a package that other people will use. Whenever I write a package, I try my best to make all its functions return stable outputs, because then the users will be able to use the package without worrying about the adverse effects of unstable outputs.

The first example that I presented in this discussion thread mimics the situation I have in this package development. The package defines several subtypes (like MyType1, …, MyType4) of the same abstract type (like MyAbs), and for each of these subtypes Base.in is extended. When I extended Base.in, I made sure all of them return Bool, believing that the users of the package would always get stable Bool upon calling Base.in for the package’s subtypes. Still, I find that Base.in returns Any no matter how hard I try to achieve stability, as demonstrated very well in this discussion thread.

I guess the only option I’m left with at this point is

  • to ask the users of the package to use type assertions whenever they call Base.in for the package’s subtypes, or
  • to give up extending Base.in and use a different function name (like using myfun instead of Base.in in the earlier example), even though Base.in is a very appropriate name for the function testing inclusion and provides the convenient infix operator ∈.

I will probably go for the second option, because I don’t want the users of my package to waste their time to resolve instability themselves. But it’s very disappointing that I cannot use Base.in stably where the name is most appropriate. Moreover, I think I will be very reluctant to extend any Base functions from now on, because the same problem can occur to any Base functions.


#15

FWIW, here’s a few observations:

  • Base.in is type-stable for your concrete types.

  • If you add another concrete type, say MyType5, and associated method myfun, then your zero_myfun example will also be type-unstable.

  • You can’t avoid dynamic dispatch (because you have a heterogeneous input vector mvec), but you could re-write zero_in so that the return type is inferred as Bool:

function zero_in(mvec::Vector{MyAbs})
    # Return true if any element of mvec contains 0
    t = false
    for m = mvec
        if in(0, m)
            t = true
            break
        end
    end
    return t
end

#16

So using a different function name than Base.in is not a solution. At one point I thought that was the solution, because code_warntype(myfun, (Int, MyAbs)) showed that myfun returned always Bool when called with (Int, MyAbs).

I think this is related with my effort above, where I made all the in methods implemented the Base module type-stable by return type annotation. There, even though I had only Bool and Union{} as the return types of Base.in, I didn’t get stability of zero_in. Similarly, here I have only Bool as the return types of myfun called with (Int, MyAbs):

julia> Base.return_types(myfun, (Int, MyAbs))
5-element Array{Any,1}:
 Bool
 Bool
 Bool
 Bool
 Bool

but I still don’t get stability of zero_myfun if I define 4+1 subtypes of MyAbs.

I don’t understand why this is happening. Even if I call myfun with (Any, Any), I get only Bool as the return types:

julia> Base.return_types(myfun, (Any,Any))
5-element Array{Any,1}:
 Bool
 Bool
 Bool
 Bool
 Bool

Then why does Julia infer the return type of myfun as Any? Does it help reducing dispatch cost as @yuyichao alluded, or it is something that needs to be fixed in the dispatch system?


#17

I’m just saying that it doesn’t matter. Using abstract input type is always unstable and result in bad performance. It just happens that the type inference tries to reduce it a little bit in simple cases but gives up when it notices that you are trying to do something that is not recommended.