Type instability in list comprehensions

I can’t find good documentation on this now, but I recall reading or hearing sometime in the past about how Base Julia is built up from a small core of intrinsic concepts. The intrinsics include struct definitions, control flow, and a small set of built-in functions like getfield and setfield!. So, in theory, Base should be defined entirely in terms of those intrinsics, but I think in practice that’s not followed 100%. (Core.Compiler.return_type is not an intrinsic.)


I think the core issue here is that Julia is a dynamic language, so the value and side effects of an expression should depend only on the runtime values of the components that make up the expression. (Types are first-class objects in Julia and can be examined via typeof, so the type of an object is a part of its “value”. And of course the type of an object affects dispatch on the object.)

Here is a simple example that hopefully demonstrates the violation of this principle in the implementation of array comprehensions. We see two expressions where the components of the expressions have the same value, but the value and/or side effects of the two expressions are different:

Define values:

const a = Int[]
const b = Int[]
c = Int[]
d = Int[]
g(::Vector{Int}) = 42

Evaluate expressions:

julia> a == c, b == d
(true, true)

julia> g([x*y for x in a for y in b])
42

julia> g([x*y for x in c for y in d])
ERROR: MethodError: no method matching g(::Vector{Any})

Closest candidates are:
  g(::Vector{Int64})
   @ Main REPL[5]:1
1 Like

Or a tweaked version of the example so that the values (not just the side effects) of the two expressions are different even though the values of the components of the expressions are the same:

Define values:

const a = Int[]
const b = Int[]
c = Int[]
d = Int[]
g(::Vector{Int}) = 1
g(::Vector{Any}) = 2

Evaluate expressions:

julia> a == c, b == d
(true, true)

julia> g([x*y for x in a for y in b])
1

julia> g([x*y for x in c for y in d])
2

I think sometimes regular Julia users are so used to thinking about type inference, type stability, and JIT compilation that they start to think of it as a language feature, but it is not a part of the semantics of the language—it is merely an implementation optimization, as mentioned by @danielwe. And implementation optimizations should not affect the behavior of a piece of code.

(To be precise, the type inference implementation optimization in Julia is not the cause of the problem here. The cause of the inconsistency is the use of Core.Compiler.return_type somewhere within the array comprehension and/or generator implementation in Base Julia.)

2 Likes

The “correct” thing to return is actually Union{}[] which is the empty array with element type Union{}.

4 Likes

I keep asking to be educated, but isn’t Union{} the only thing that f can’t return? My intuition says that the eltype would be the narrowest guess the compiler can provide. When it has elements to work with, it gives Float64. When it doesn’t have elements, the compiler would say “I dunno, the result could be Anything.” No?

True, f cannot return an element of type Union{} (since there are no such values), but the right dynamic behavior is for the returned array to have an element type that is “as small as possible” but contains all the values that are actually returned. When zero elements are returned, then the element type of can be Union{} and this preserves type stability since type inference knows that if you take and element out of the returned array and were expecting a value of type Int (for example), then the value you get won’t not be of type Int since Union{} <: Int, but really what it means is that you can’t take any elements our because there can’t be any.

When it doesn’t have elements, the compiler would say “I dunno, the result could be Any thing.” No?

Or put another way, when there are no elements, we know that all the elements are instances of Union{} (because there are none).

“As small as possible” may not be exactly what you want since the smallest type is the union of the types of all the elements, which can get messy when the values are of mixed types. In those cases we use the typejoin function to climb the abstract type hierarchy to find some abstract type that contains all the individual types.

4 Likes

I should add that the main reason we don’t do this is that you can’t do a lot with such an array, eg push any elements into it. I do think that may have been the wrong choice.

5 Likes

A PR was recently merged that updates the docs to clarify the current behavior:

2 Likes