# Extending Base.in type-stably

#1

(My original question seemed a bit convoluted, so I changed it significantly to make it succinct, but the basic questions are the same.)

I’m experiencing a difficulty in extending a `Base` function type-stably. Not sure if the same difficulty arises for any `Base` functions, but the one I’m having a difficulty with is `Base.in`.

Consider the following code where I extend `Base.in` for four concrete types of the same abstract type `MyAbs`:

``````abstract type MyAbs end

struct MyType1 <: MyAbs end
struct MyType2 <: MyAbs end
struct MyType3 <: MyAbs end
struct MyType4 <: MyAbs end

Base.in(x::Int, ::MyType1) = x==1
Base.in(x::Int, ::MyType2) = x==2
Base.in(x::Int, ::MyType3) = x==3
Base.in(x::Int, ::MyType4) = x==4
``````

Now, I would like to call `in` with `Int` and `MyAbs` using the function below, which takes a vector of `MyAbs`, iterates over the vector, and executes `in` for each entry of the vector.

``````function zero_in(mvec::Vector{MyAbs})
# Return true if any element of mvec contains 0
t = false
for m = mvec
t = in(0, m)
t && break
end
return t
end
``````

The output of this function turns out to be type-unstable:

``````julia> VERSION
v"0.6.1-pre.0"

julia> mvec = [MyType1(), MyType2(), MyType3(), MyType4()];

julia> @code_warntype zero_in(mvec)
Variables:
#self#::#zero_in
mvec::Array{MyAbs,1}
m::MyAbs
#temp#::Int64
t::Any

Body:
begin
t::Any = false # line 3:
#temp#::Int64 = 1
4:
unless (Base.not_int)((#temp#::Int64 === (Base.add_int)((Base.arraylen)(mvec::Array{MyAbs,1})::Int64, 1)::Int64)::Bool)::Bool goto 17
SSAValue(2) = (Base.arrayref)(mvec::Array{MyAbs,1}, #temp#::Int64)::MyAbs
m::MyAbs = SSAValue(2)
#temp#::Int64 = SSAValue(3) # line 4:
t::Any = (0 in m::MyAbs)::Any # line 5:
unless t::Any goto 15
goto 17
15:
goto 4
17:  # line 7:
return t::Any
end::Any
``````

On the other hand, let’s repeat the same experiment with a non-Base function. Here I define `myfun` that does exactly the same thing as `Base.in` above, and also `zero_myfun` that does exactly the same thing as `zero_in` above:

``````myfun(x::Int, ::MyType1) = x==1
myfun(x::Int, ::MyType2) = x==2
myfun(x::Int, ::MyType3) = x==3
myfun(x::Int, ::MyType4) = x==4

function zero_myfun(mvec::Vector{MyAbs})
t = false
for m = mvec
t = myfun(0, m)
t && break
end
return t
end
``````

Now, unlike the previous experiment, `zero_myfun` produces a type-stable output:

``````julia> @code_warntype zero_myfun(mvec)
Variables:
#self#::#zero_myfun
mvec::Array{MyAbs,1}
m::MyAbs
#temp#::Int64
t::Bool

Body:
begin
t::Bool = false # line 3:
#temp#::Int64 = 1
4:
unless (Base.not_int)((#temp#::Int64 === (Base.add_int)((Base.arraylen)(mvec::Array{MyAbs,1})::Int64, 1)::Int64)::Bool)::Bool goto 17
SSAValue(2) = (Base.arrayref)(mvec::Array{MyAbs,1}, #temp#::Int64)::MyAbs
m::MyAbs = SSAValue(2)
#temp#::Int64 = SSAValue(3) # line 4:
t::Bool = (Main.myfun)(0, m::MyAbs)::Bool # line 5:
unless t::Bool goto 15
goto 17
15:
goto 4
17:  # line 7:
return t::Bool
end::Bool
``````

Why do these two experiments give different behavior? More importantly, how can I extend `Base.in` to produce a stable output in the first experiment?

#2

A Vector of an abstract tyoe is inherently type unstable, in the sense that julia does not know which (concrete) type its elements have.

#3

Elements of a `Vector` being abstract is not a problem here. You can easily create a function that returns a stable output while taking an abstract input. For example,

``````julia> absinput(x::Any) = 1
absinput (generic function with 1 method)

julia> code_warntype(absinput, (Any,))
Variables:
#self#::#absinput
x::Any

Body:
begin
return 1
end::Int64
``````

Also, my second test case above with `myfun` shows that `zero_myfun` taking a `Vector` of an abstract type produces a stable output. My main question is why almost the same code becomes unstable when `myfun` is replaced with `Base.in`.

#4

First, notice when you reduce the number of MyTypes to 3, the type of `t` in `zero_in` is inferred and type stability restored.

This shows it’s the complexity of type inference which seems to be a problem. At the top of `base/inference.jl` there is `const MAX_TYPEUNION_LEN = 3`. And later on this determines if to giveup on big Unions and infer an Any type. IMO this is what’s going on here.

Union{MyType1,MyType2,MyType3,MyType4} is inferred as Any and the return_type inferred for Any for Base.in is not Bool (some methods of Base.in return other types).

In the case of myfun, there are only methods which return Bool, and the inference engine can determine a Bool result for Any too. This can be seen by looking at the output of:

``````Base.return_types(Base.in)
Base.return_types(myfun)
``````

To recreate the instability of `Base.in` for `myfun`, another `myfun` method can be added:

``````myfun(x::Int, y) = 5
``````

One other thing: If a Union{MyType1,MyType2,MyType3,MyType4} is used in the definition of `mvec` and the parameter type of `zero_in` instead of MyAbs, then the inference engine seems to work.

Hope this explains the uneven behavior, even though I don’t see any clean workaround. Maybe someone else will.

#5

A simple and not too ugly way to restore stability is to change `zero_in` to:

``````function zero_in(mvec::Vector{MyAbs})
# Return true if any element of mvec contains 0
t = false
for m = mvec
t = in(0, m)::Bool
t && break
end
return t
end
``````

With this definition:

``````julia> @code_warntype zero_in(mvec)
Variables:
#self#::#zero_in
mvec::Array{MyAbs,1}
m::MyAbs
#temp#::Int64
t::Bool

Body:
begin
t::Bool = false
#= line 4 =#
#temp#::Int64 = 1
4:
unless (Base.not_int)((#temp#::Int64 === (Base.add_int)((Base.arraylen)(mvec::Array{MyAbs,1})::Int64, 1)::Int64)::Bool)::Bool goto 17
SSAValue(2) = \$(Expr(:invoke, MethodInstance for getindex(::Array{MyAbs,1}, ::Int64), :(Base.getindex), :(mvec), :(#temp#)))::MyAbs
m::MyAbs = SSAValue(2)
#temp#::Int64 = SSAValue(3)
#= line 5 =#
t::Bool = (Core.typeassert)((0 in m::MyAbs)::Any, Main.Bool)::Bool
#= line 6 =#
unless t::Bool goto 15
goto 17
15:
goto 4
17:
#= line 8 =#
return t::Bool
end::Bool
``````

#6

I’m surprised that some `in` methods implemented in Julia’s `Base` module return unstable output. Shouldn’t they be written to return `Bool` by all means, e.g., by using return type annotation?

#7

Yeah, it is surprising. So, I tried to track down one of the non-Bool return values of `Base.in` and got to the method for `Base.in(Float64,Dict)`.
This generates an error, as it should, since `Dict`s contain only `Pair`s, but the function does not return any value after raising the error, which leaves the inference engine with a `nothing` return. Probably should have a `return false` dummy statement after raising the error.

It might be appropriate to open an issue for this.

#8

No it doesn’t. It returns an error and does not affect inference.

#9

I figured that if a function returns an error, its return type is `Union{}`, which is a subtype of any type. Therefore `Union{Union{},Bool}` is `Bool`. This means that if some function has methods returning `Union{}` and `Bool`, the compiler infers the return type correctly as `Bool`. I think this is what @yuyichao meant. I checked this is indeed what happens.

So, `in` methods returning `Union{}` is not a problem. Problematic ones are `in` methods returning `Any`. I think these ones can be made to return `Bool` by return type annotation at least, or by type asserting local variables correctly.

#10

Right, `@code_warntype` handles the error raised correctly and infers a `Union{}` type.
In another case: `Base.in('a','a':'z')` Julia0.7 correctly infers a Bool return value with `@code_warntype in('a','a':'z')`, but

``````Base.return_types(Base.in,(Char,Range{Char}))
``````

returns Any.
Possibly it is this Any (and other Any return_types) which cause the problem @wsshin described, but I would certainly like to know.

#11

The reason `@code_warntype in('a','a':'z')` indicates a stable output while `Base.return_types(Base.in,(Char,Range{Char}))` doesn’t is that `Range{Char}` is an abstract type. The type of `'a':'z'` is `StepRange{Char,Int64}`, which is a subtype of `Range{Char}`, and

``````Base.return_types(in, (Char,StepRange{Char,Int64}))
``````

returns `Bool`.

#12

So I tried to remove return type instability of all `in` methods of `Base` module, but the problem still persists.

Specifically, I tracked down all the `in` methods returning `Any` in the `Base` module, and annotated their return types wth `::Bool`. Then I could see that `Base.in` always returns either `Union{}` or `Bool`:

``````julia> VERSION
v"0.7.0-DEV.1283"

julia> Base.return_types(Base.in)
28-element Array{Any,1}:
Bool
Bool
Bool
Bool
Bool
Bool
Bool
Bool
Bool
Bool
Bool
Bool
Bool
Bool
Bool
Union{}
Bool
Bool
Bool
Bool
Bool
Bool
Bool
Union{}
Bool
Bool
Bool
Bool
``````

Then, I repeated the original experiment with `MyAbs` and `MyType`s, hoping that now type stability of the extended `Base.in` would be recovered. Unfortunately it didn’t:

``````julia> @code_warntype zero_in(mvec)
Variables:
#self#::#zero_in
mvec::Array{MyAbs,1}
t::Any
#temp#::Int64
m::MyAbs

Body:
begin
t::Any = false
#= line 4 =#
#temp#::Int64 = 1
4:
unless (Base.not_int)((#temp#::Int64 === (Base.add_int)((Base.arraylen)(mvec::Array{MyAbs,1})::Int64, 1)::Int64)::Bool)::Bool goto 17
SSAValue(2) = \$(Expr(:invoke, MethodInstance for getindex(::Array{MyAbs,1}, ::Int64), :(Base.getindex), :(mvec), :(#temp#)))::MyAbs
m::MyAbs = SSAValue(2)
#temp#::Int64 = SSAValue(3)
#= line 5 =#
t::Any = (0 in m::MyAbs)::Any
#= line 6 =#
unless t::Any goto 15
goto 17
15:
goto 4
17:
#= line 8 =#
return t::Any
end::Any
``````

So I guess our analysis was wrong. There is something else causing this instability.

#13

I’ll repeat what I said in the github issue. Getting type stability here won’t help you here.

#14

@yuyicho, I’m not trying to reduce the dispatch cost as you suspected in that GitHub issue. I’m just trying to achieve type stability of `Base.in` that I extended.

The reason I want type stability is that I am writing a package that other people will use. Whenever I write a package, I try my best to make all its functions return stable outputs, because then the users will be able to use the package without worrying about the adverse effects of unstable outputs.

The first example that I presented in this discussion thread mimics the situation I have in this package development. The package defines several subtypes (like `MyType1`, …, `MyType4`) of the same abstract type (like `MyAbs`), and for each of these subtypes `Base.in` is extended. When I extended `Base.in`, I made sure all of them return `Bool`, believing that the users of the package would always get stable `Bool` upon calling `Base.in` for the package’s subtypes. Still, I find that `Base.in` returns `Any` no matter how hard I try to achieve stability, as demonstrated very well in this discussion thread.

I guess the only option I’m left with at this point is

• to ask the users of the package to use type assertions whenever they call `Base.in` for the package’s subtypes, or
• to give up extending `Base.in` and use a different function name (like using `myfun` instead of `Base.in` in the earlier example), even though `Base.in` is a very appropriate name for the function testing inclusion and provides the convenient infix operator ∈.

I will probably go for the second option, because I don’t want the users of my package to waste their time to resolve instability themselves. But it’s very disappointing that I cannot use `Base.in` stably where the name is most appropriate. Moreover, I think I will be very reluctant to extend any `Base` functions from now on, because the same problem can occur to any `Base` functions.

#15

FWIW, here’s a few observations:

• `Base.in` is type-stable for your concrete types.

• If you add another concrete type, say `MyType5`, and associated method `myfun`, then your `zero_myfun` example will also be type-unstable.

• You can’t avoid dynamic dispatch (because you have a heterogeneous input vector `mvec`), but you could re-write `zero_in` so that the return type is inferred as `Bool`:

``````function zero_in(mvec::Vector{MyAbs})
# Return true if any element of mvec contains 0
t = false
for m = mvec
if in(0, m)
t = true
break
end
end
return t
end
``````

#16

So using a different function name than `Base.in` is not a solution. At one point I thought that was the solution, because `code_warntype(myfun, (Int, MyAbs))` showed that `myfun` returned always `Bool` when called with `(Int, MyAbs)`.

I think this is related with my effort above, where I made all the `in` methods implemented the `Base` module type-stable by return type annotation. There, even though I had only `Bool` and `Union{}` as the return types of `Base.in`, I didn’t get stability of `zero_in`. Similarly, here I have only `Bool` as the return types of `myfun` called with `(Int, MyAbs)`:

``````julia> Base.return_types(myfun, (Int, MyAbs))
5-element Array{Any,1}:
Bool
Bool
Bool
Bool
Bool
``````

but I still don’t get stability of `zero_myfun` if I define 4+1 subtypes of `MyAbs`.

I don’t understand why this is happening. Even if I call `myfun` with `(Any, Any)`, I get only `Bool` as the return types:

``````julia> Base.return_types(myfun, (Any,Any))
5-element Array{Any,1}:
Bool
Bool
Bool
Bool
Bool
``````

Then why does Julia infer the return type of `myfun` as `Any`? Does it help reducing dispatch cost as @yuyichao alluded, or it is something that needs to be fixed in the dispatch system?

#17

I’m just saying that it doesn’t matter. Using abstract input type is always unstable and result in bad performance. It just happens that the type inference tries to reduce it a little bit in simple cases but gives up when it notices that you are trying to do something that is not recommended.