Type instability in list comprehensions

Consider the following code:

julia> x = [1, 2];
julia> y = Float64[];
julia> [xi * yi for xi in x for yi in y]
Any[]

julia> f(x, y) = [xi * yi for xi in x for yi in y]; f(x, y)
Float64[]

so the whole thing is type-unstable. Even more surprisingly, pasting the arguments into the comprehension resolves it:

julia> [xi * yi for xi in [1,2] for yi in Float64[]]
Float64[]

One way to resolve this is to use map together with product:

julia> vec(map(((xi,yi),) -> xi*yi, Iterators.product(x, y)))
Float64[]

which is clunky and annoying. This, to me, seems to be a straight-forward julia bug, which caused all sorts of nasty surprises to me. Why does julia behave that way?

Edited to add: note that all of these example are type-stable if y is non-empty:

julia> y = [3.0];
julia> [xi * yi for xi in x for yi in y]
4-element Vector{Float64}:
  3.0 
  6.0

in this case, x and y are globals so you’ll have a hard time finding type stability no matter what you do with them. you can resolve this particular problem by either adding type assertions or making them const (or both)

but the “better” solution is to add a function barrier. when you put it inside a function, x and y are specialized on with a type.

4 Likes

Yeah, don’t expect type stability with globals. Or make them const:

julia> const x = [1,2];

julia> const y = Float64[];

julia> [xi * yi for xi in x for yi in y]
Float64[]
4 Likes

You can also type your list array comprehensions.

julia> x = [1, 2];

julia> y = Float64[];

julia> [xi * yi for xi in x for yi in y]
Any[]

julia> Float64[xi * yi for xi in x for yi in y]
Float64[]
3 Likes

Hang on, but that is not true if x and y are non-empty!

julia> x = [1, 2]; y = [3., 4.]
julia> [xi*yi for xi in x for yi in y]
4-element Vector{Float64}:
  3.0
  4.0
  6.0
  8.0

So, it only does not work for empty ones! So IMHO type instability of globals does not (fully) explain this…

If the lists are not empty, then Julia can look at the actual types produced by the function in the comprehension to figure out the eltype of the result vector.

If they are empty, then the compiler has to rely on the Type Domain to compute what the eltype ought to be. So the type instability of globals messes it up!

3 Likes

But if I understand you correctly, that would mean that this example is type-instable anyway, it’s just that the compiler looks at the result type of the function as applied to the first item, guesses that this is going to be the eltype of the result vector, and then course-corrects once it sees that this fails? In other words, it has to do dynamical dispatch for every element?

The other thing I still do not understand is why const is so important here … at the moment of invocation of the command on the repl, I know the types of these variables, whether they are const or not, and I have to select the current code paths anyway, so why do I need const here?

My answer is an oversimplification of the fancy stuff the compiler does to try to tack down type stability in a dynamic language. It does not course correct for each iteration, but I don’t fully know how it works.

1 Like

not quite.

[foo(x) for x in arr if bar(x)] gets “lowered” to something more like collect(Base.Iterators.Generator(bar, Base.Iterators.Filter(bar, arr))) at which point the compiler can and will do all sorts of fancy things more sophisticated than just looking at the first element (actually if foo and bar are type stable and arr is concretely typed, I don’t think it should look at any elements at all. but I’m not an expert here)

x = [1,2,3]
y_int = Int[]
y_float = Float[]
y = y_int

# round and round it goes 
# where will it stop? Nobody knows!
@spawn for i in 1:1000
    y = (y_int, y_float)[i%2+1]
    sleep(0.001)
end

# between compile time and run time of this function, the type of y may change
[xi * yi for xi in x for yi in y]
2 Likes

I think it is all about type instabilities of using global variables inside functions, means without passing it as arguments.

global variables can change its type at any time to any thing.

Try this code and you will see that the compiler can’t handle the type the elements in the empty Float array

julia> x = Float64[]
Float64[]

julia> function g()
           for i in x
               println(typeof(i))
           end
       end
g (generic function with 1 method)

julia> @code_warntype g()
MethodInstance for g()
  from g() @ Main REPL[2]:1
Arguments
  #self#::Core.Const(g)
Locals
  @_2::Any
  i::Any    #  <--------------- See here
Body::Nothing
1 ─ %1  = Main.x::Any
│         (@_2 = Base.iterate(%1))
│   %3  = (@_2 === nothing)::Bool
│   %4  = Base.not_int(%3)::Bool
└──       goto #4 if not %4
2 ┄ %6  = @_2::Any
│         (i = Core.getfield(%6, 1))
│   %8  = Core.getfield(%6, 2)::Any
│   %9  = Main.typeof(i)::DataType
│         Main.println(%9)
│         (@_2 = Base.iterate(%1, %8))
│   %12 = (@_2 === nothing)::Bool
│   %13 = Base.not_int(%12)::Bool
└──       goto #4 if not %13
3 ─       goto #2
4 ┄       return nothing

One main consideration is if the compiler can assume the global variables and their type will not change. Recent Julia versions allow you to bind the types of globals.

julia> x::Vector{Float64} = [1,2];

julia> y::Vector{Float64} = [];

julia> [xi * yi for xi in x for yi in y]
Float64[]

julia> f() = [xi * yi for xi in x for yi in y]
f (generic function with 1 method)

julia> @code_warntype f()
MethodInstance for f()
  from f() @ Main REPL[3]:1
Arguments
  #self#::Core.Const(f)
Locals
  #2::var"#2#3"
Body::Vector{Float64}
1 ─      (#2 = %new(Main.:(var"#2#3")))
│   %2 = #2::Core.Const(var"#2#3"())
│   %3 = Base.Generator(%2, Main.x)::Base.Generator{Vector{Float64}, var"#2#3"}
│   %4 = Base.Flatten(%3)::Base.Iterators.Flatten{Base.Generator{Vector{Float64}, var"#2#3"}}
│   %5 = Base.collect(%4)::Vector{Float64}
└──      return %5

Compare the typed output to the following.

julia> x = [1,2]
2-element Vector{Int64}:
 1
 2

julia> y = Float64[]
Float64[]

julia> f() = [xi * yi for xi in x for yi in y]
f (generic function with 1 method)

julia> @code_warntype f()
MethodInstance for f()
  from f() @ Main REPL[3]:1
Arguments
  #self#::Core.Const(f)
Locals
  #2::var"#2#3"
Body::Vector
1 ─      (#2 = %new(Main.:(var"#2#3")))
│   %2 = #2::Core.Const(var"#2#3"())
│   %3 = Base.Generator(%2, Main.x)::Base.Generator{_A, var"#2#3"} where _A
│   %4 = Base.Flatten(%3)::Base.Iterators.Flatten{I} where I<:(Base.Generator{_A, var"#2#3"} where _A)
│   %5 = Base.collect(%4)::Vector
└──      return %5
1 Like

This looks like an example of compiler optimizations leaking into observable behavior—specifically, an expression whose return value depends on the success of type inference. I agree that this should be considered a straightforward bug.

Type indeterminism might be a more accurate description than type instability for this kind of behavior.

But it doesn’t?

It does? The OP shows that the expression [xi * yi for xi in x for yi in y] can return different values (Any[] vs Float64[]) for the same values of x and y.

for the same values of x and y.

but they are not the same values of x (kinda, in the === sense that they are distinguishable in valid programs)

in one

julia> x = [1, 2];

julia> Core.get_binding_type(Main, :x)
Any

and the other

julia> x::Vector{Float64} = [1, 2];

julia> Core.get_binding_type(Main, :x)
Vector{Float64} (alias for Array{Float64, 1})

But behavior in Julia is (should be) determined by runtime values, not binding types. This is why multiple dispatch is different from C++ function overloading. It’s certainly possible to write code that violates this rule using internals like Core.Compiler.return_type, but it’s strongly discouraged and the creators regret that such functions are exposed at all: Create `infer_eltype` function by Tortar · Pull Request #54909 · JuliaLang/julia · GitHub

1 Like

I agree with @danielwe, although I wouldn’t quite call it a bug since Julia has been this way for a long time, probably since before 1.0. Base Julia does make use of Core.Compiler.return_type in some places like in generators and array comprehensions and I think also in broadcasting.* (I’m not sure if the map implementation uses Core.Compiler.return_type anywhere…) So we sometimes encounter cases like this where the value of an expression depends on type inference, which is definitely not a good thing.

We would need some different semantics to solve some of these problems. Perhaps the mutate-or-widen approach advocated by @tkf.

*I’m speaking in generalities here. I don’t remember exactly where Base uses Core.Compiler.return_type, I just know that I’ve seen it in some of those places when I’ve looked at the code in the past.

It’s perhaps worth noting that harmless use of Core.Compiler.return_type is possible. The rule is to use it for optimizations only and ensure that your function’s return value is the same regardless of what return_type gives you.

I too have a hazy memory of discussions relating to map over empty iterables and inference-dependent return values, but I thought these were also considered bugs? It’s bad because return_type can differ in its ability to infer a particular call not only between Julia versions but even within a single Julia session.

In any case, I don’t think this has anything to do with type inference or Core.Compiler.return_type, and in particular I don’t think it’d be susceptible to the nonndeterminism of type inference. AFAIK Julia models non-const globals as some kind of reference, which is typed for typed globals, and untyped for untyped globals. So, even though the relevant types are not accessible to the user, this is simply a matter of the ouput type depending on the input type, as it’s supposed to.

2 Likes