Type instability in list comprehensions

danielwe · August 1, 2024, 4:52pm

My point is that such under-the-hood implementation details should be irrelevant to the semantics of an expression. I mean, typeof(x) returns the type of the value bound to x, regardless of how that binding is implemented.

mrufsvold · August 1, 2024, 5:38pm

But isn’t that a problem of bindings not having a world age? The typeof(x) is Any for a non-const global across world ages, even if it has a specific type right now.

And since Julia is a compiled language, it has to compile code that will be correct until it is invalidated and recompiled. But since changing x doesn’t increment the world age, you can’t use the type as it is when you compile the code.

So then you end up with leaky type instability where, not only do you not know the value of x within the function, but the value of x changes the return type of the whole function.

I wouldn’t call this a bug so much as a limitation imposed by not wanting to try to track the backedges of non-const global values.

danielwe · August 1, 2024, 6:26pm

I’ve always been told that Julia is a dynamic language, which does type inference and compilation as an optimization while taking care to preserve dynamic semantics. The issue raised in the OP is about an expression evaluated in global scope, so it isn’t compiled anyway, but even inside compiled functions, a binding with a non-concrete inferred type will result in a dynamic load, often with knock-on effects like dynamic dispatch and type instability, but preserving the map from {argument values, global state, data from io, whatever else} to return value.

julia> f() = typeof(x);

julia> x = 1.0;

julia> f()
Float64

julia> x = [1];

julia> f()
Vector{Int64} (alias for Array{Int64, 1})

I’m not an expert on any of this, so please educate me if I’ve misunderstood something, but what you’re claiming here goes directly against what I’ve learned about the language.

mrufsvold · August 1, 2024, 6:26pm

Okay, I wrote that comment, but that caused me to sort of persuade myself the other direction. If what I wrote is true, then why would having values in the vectors result in a narrower eltype for the result vector?

We still get the same leaky type-instability across world ages I was talking about…

mrufsvold · August 1, 2024, 6:50pm

Dynamic and compile are not opposites. Dynamic and static typing are opposites. Interpreted and compiled are opposites. Julia is a (usually) just-ahead-of-time compiled language – which uses dynamic typing and tries to smartly tack down static types, where it can, to improve performance.

Julia is not fully dynamic like LISP. You can’t redefine const globals, you can’t redefine types (at the moment at least…), you can’t redefine a function without returning to top level or calling invokelatest. This is because the core devs have made tradeoff decisions about reducing dynamism so the compiler can improve its inference.

So this is another case where we have to choose between dynamic and static. I’m saying it makes sense to say “we want the return type of the comprehension to be static, so if you try to run a comprehension over non-const globals, you get Vector{Any} back.” But that argument would also cut against returning a narrow eltype when you have values in the input vectors.

nsajko · August 1, 2024, 6:52pm

I like to think that Julia transcends/generalizes the old static vs dynamic dichotomy.

mrufsvold · August 1, 2024, 7:12pm

I couldn’t help myself…

mkitti · August 1, 2024, 7:25pm

This reasoning is flawed. The expression, the array comprehension, is lowered to create a Base.Generator with an anonymous function. Use Meta.@lower to see for yourself. Some compilation may be occurring.

danielwe · August 1, 2024, 8:54pm

I know. I was responding to the claim that Julia must behave statically when referencing globals because it is compiled. The claim that compilation would constrain the semantics like that wasn’t mine. But based on your last two comments I think we agree that the OP exemplifies an incidental leak from type inference to observed behavior, rather than a consistent design where behavior depends on the types of bindings rather than values?

Fair point. However, in this case, it looks like at least one of the global variables is passed as an argument to the Generator from global scope rather than being dereferenced within a compiled unit, which was the relevant distinction. Caveat that I’m not well versed in reading lowered code, but Meta.@lower prints the following (truncated; the point is that x is referenced at the top level of the thunk). Please let me know if I’m misinterpreting this.

:($(Expr(:thunk, CodeInfo(
[...]
│   %13 = Base.Generator(%12, x)
│   %14 = Base.Flatten(%13)
│   %15 = Base.collect(%14)
└──       return %15

mrufsvold · August 1, 2024, 9:02pm

I think where I’d push back is that there are lots of places where Julia is explicitly not dynamic (my examples above), so this isn’t an issue so much of Julia’s fidelity to its dynamic semantics as it is an issue of a pragmatic design choice:

Should a Generator produce a container with a narrowed eltype when the inputs aren’t statically known?

I don’t take issue with Julia providing the narrowest type inference it can even if the behavior changes in different contexts, but it seems counter intuitive to me that f() = [xi*yi for xi in x for yi in y] would ever return a Vector{Float} since you can’t infer that from the input of f().

danielwe · August 1, 2024, 9:33pm

Curious to understand this position better. Is your point that you consider iterables with different eltypes fungible, i.e. that no one should depend on eltype for behavior, so the language should be free to return as wide or narrow an eltype as is convenient (and to change this between versions without it being considered breaking)?

Because surely you don’t disagree with the following returning "Number"?

f(::Number) = "Number"
f(::Any) = "Not a Number"
g() = f(x)
x = 2
g()

mrufsvold · August 1, 2024, 9:51pm

If it’s not clear from my posts – I’m sorta thinking through my position on this in real time. So I just want to say that I see the contradiction you are pointing out, and yet my gut still says consistently widening the eltype of a container in the face of type instability is the right call. But I’m not sure how to resolve that dissonance right now.

I’ll circle back if I get an a-ha

CameronBieganek · August 2, 2024, 2:00am

Semantically in Julia the return type of a function is undefined. So, there is no semantic way to reason about what the return type of map(f, Int[]) should be. The docstring of map is quite vague and makes no mention of the behavior on empty collections, so it does not contradict the map docstring for this

f(x::Int) = x/2
map(f, Int[])

to return Float64[]. However, philosophically I don’t really like it. Since the output type of f is unknown, the only reasonable empty vector to return is Any[].

To be clear, returning Any[] is a choice that would be made by the implementer of map, and that choice would need to be documented in the docstring. In other words, returning Any[] would not be a result of type inference. The implementer could just as well choose to throw an error or return nothing when the input collection is empty.

I realize that if map(f, Int[]) returned Any[], then map(::typeof(f), ::Vector{Int}) would be type unstable, but I am not talking about performance here. I am speaking only of what makes sense within the semantics of the language. (Functions do not have a known return type!)

mkitti · August 2, 2024, 2:36am

I’m a bit confused.

julia> f(x::Int) = x/2
f (generic function with 1 method)

julia> Base.return_types(f)
1-element Vector{Any}:
 Float64

CameronBieganek · August 2, 2024, 2:59am

That’s an internal implementation detail. Base.return_types is not a public function. It’s not a part of the semantics of the language.

Imagine you’re implementing mymap(f, x) in a package. If you want to stick to the public API of Base Julia, then you can’t use Core.Compiler.return_type or Base.return_types. So you have no way to query the return type of an arbitrary function f that is provided as the first argument to mymap. Thus you have no way of knowing which T[] to return when the second argument to mymap is an empty collection.

In an ideal world, Base Julia would be written using only intrinsics, but instead we’ve got things like Core.Compiler.return_type being used in various places.

mwallerb · August 2, 2024, 7:28am

I also have the slight feeling that the “type instability” in the title is leading us astray here. I now understand that Julia globals have static type Any and the compiler refuses to specialize on this when you type anything into the repl. (I will return to this later.)

But to me, there is still a strange inconsistency in terms of semantics. Again consider the following two statements:

julia> vec(map(((xi,yi),) -> xi*yi, Iterators.product(x, y)))
julia> [xi * yi for xi in x for yi in y]

For any actual any global x, y of actual type Vector{T} , the first one will always return Vector{T} while the second will return a Vector{T} if both lists are non-empty, and Vector{Any} otherwise.

Which, in my book, is an, let’s call it inconsistency – and an unneccessary one, since map is obviously able to figure out the return type by inspecting the eltype of the argument and the return_type of the function. So list comprehension not doing the same thing seems at least very strange (and I am not sure that an “oh but everything goes out the window once you do globals” argument saves us here).

Edited to add: thanks though for all the comprehensive answers!

Jollywatt · August 2, 2024, 8:38am

To be precise, non-const global variables do have a concrete type at all times — they just might not be constant, so the compiler doesn’t bother to specialize. The docs say:

The value of an untyped global variable might change at any point, possibly leading to a change of its type. This makes it difficult for the compiler to optimize code using global variables.
Performance Tips · The Julia Language

Adding to your observation:

But to me, there is still a strange inconsistency in terms of semantics. Again consider the following two statements:
julia> vec(map(((xi,yi),) -> xi*yi, Iterators.product(x, y)))
Float64[]
julia> [xi * yi for xi in x for yi in y]
Any[]

…it’s interesting to consider this example too:

julia> vec([xi * yi for xi in x, yi in y]) # a single 2D for loop
Float64[]

This has the same result as the version with map. The for loop in the last example is syntactic sugar for a Iterators.product(x, y), as opposed to two nested iterators.

The inconsistency in these results appear to be an example of to how Julia’s type inference can be lazy. Unfortunately, it isn’t an oracle, but will give up if the code is too complex or dynamic. We have found that nested array comprehensions ([for … for …]) aren’t type inferred as well as map or n-d array comprehensions ([for …, …]).

mrufsvold · August 2, 2024, 2:29pm

Could you expand on that idea a bit? Isn’t everything intrinsics at the bottom, including return_type?

mrufsvold · August 2, 2024, 3:01pm

Alright, I did some mulling. Here are my thoughts:

Where Julia has a real object to look at, it respects dynamic semantics. This is especially true for multiple dispatch which always dispatches on the real, concrete types of the arguments of a call. So for your example with Number/ Not a Number, we have to have dynamic dispatch on the value of x at runtime.
In the case of an empty container, there are no real values to look at, so we fallback to the Type System to try to infer the best eltype. When we are talking about non-const globals, the type system can’t help us, so we get Vector{Any}. Which is fine because…
The promise of a container’s eltype is not that it is the narrowest type that is a supertype of the elements, but that it is the supertype of any object that can be stored in the container.
Following from 3, there is nothing incorrect about changing the behavior of map(identity, 1:5) = Any[1,2,3,4,5]. This isn’t a dispatch question, and the runtime objects (1…5) fit into the resulting vector… but of course, this tanks performance unnecessarily. And since we can statically infer that the elements are Ints, it makes total sense to narrow the eltype to Int.

But then, we have this question of what should be the behavior here:

y = [1,2,3]
x = [1.0]
f() = [xi*yi for xi in x for yi in y]
f() # returns Vector{Float64}

The thought I’m trying to process is this: Right now, the return type of f is dependent on the runtime type of x and y. So any use of f is type unstable.

function g()
    v = f()::???
    v .+ 5
end

But if the compiler took the fact that the type of x and y weren’t constant and instead made the eltype of the resulting vector Any, then the return type of f is now type stable.

Of course, this now passes the buck of type instability to accessing the elements of that Vector. So I’m not saying this behavior is better, but I am saying it is a different design choice which is consistent with the promises of Julia. Nothing about the philosophy of Julia precludes widening the eltype of a container so that the compiled code can be correct overtime without calling out to Compiler functions at runtime.

Edit: So all this rambling to say: for any higher-level function that produces a container, Julia could always return Any for the eltype. From that base behavior, we can improve performance by narrowing that eltype. When and how that narrowing happens (with runtime types, inference, etc) has tradeoffs that need to be considered. But I don’t see a real problem with having different behaviors when the compiler has a different information about the data.

Edit2: Just spitballing at this point, but given the fact that we have a distinction between Vector and Memory now, you could theoretically have a Vector{Any} that points to a Memory{Float64}. And add a new function called resolve_eltype which narrows the Vector’s eltype to the inner Memory at the moment where you actually want to deal with the type instability, and it wouldn’t require allocating a whole new Memory.

mikmoore · August 2, 2024, 3:38pm

It appears to me that some of the disagreement is whether a comprehension should act as function barrier. This goes back to the original example (which I will modify slightly here to wrap both in functions):

julia> x = [1, 2];

julia> y = Float64[];

julia> f() = [xi * yi for xi in x for yi in y]; f()
Any[]

julia> g(x,y) = [xi * yi for xi in x for yi in y]; g(x,y)
Float64[]

If the comprehension looked at all of its captures and passed them via a closure (inducing a function barrier), then the result should be type inferrable. However, it appears that it does not do this and it represents a mere expression in its scope which means untyped variables make it impossible to reason about its return type (although it may dynamically discover a return type by explicit evaluation, when elements are available to evaluate).

Note that a let can restore type inferrability by making typed captures of the global variables:

julia> h() = let x=x,y=y; [xi * yi for xi in x for yi in y]; end; h()
Float64[]

Note that even when elements are available, the result is not inferrable with untyped globals (hence I use the word “discoverable” above):

julia> [rand(Bool) ? x : y for _ in 1:3]
3-element Vector{Vector{Int64}}:
 [1, 2]
 [1, 2]
 [1, 2]

julia> [rand(Bool) ? x : y for _ in 1:3]
3-element Vector{Vector}:
 [1, 2]
 Float64[]
 Float64[]

Notice that the return types are different here, even though the result “could” be inferred to the correct Vector{Union{Vector{Int},Vector{Float64}}} if type information were available. Instead, it is merely post-facto “discovered” by inspecting the results.

Topic		Replies	Views
Type instability with enumerate in comprehension New to Julia type-stability	1	482	July 8, 2021
Strange type instability General Usage	2	330	December 22, 2023
Type stability when arguments are functions General Usage	7	2334	July 8, 2017
Type-unstable application of `map` Performance	4	328	August 13, 2023
Ominous type instability Performance bug , code_warntype , type-stability	10	1187	August 20, 2022

Type instability in list comprehensions

Related topics