My point is that such under-the-hood implementation details should be irrelevant to the semantics of an expression. I mean, typeof(x)
returns the type of the value bound to x
, regardless of how that binding is implemented.
But isnāt that a problem of bindings not having a world age? The typeof(x)
is Any
for a non-const global across world ages, even if it has a specific type right now.
And since Julia is a compiled language, it has to compile code that will be correct until it is invalidated and recompiled. But since changing x
doesnāt increment the world age, you canāt use the type as it is when you compile the code.
So then you end up with leaky type instability where, not only do you not know the value of x
within the function, but the value of x
changes the return type of the whole function.
I wouldnāt call this a bug so much as a limitation imposed by not wanting to try to track the backedges of non-const global values.
Iāve always been told that Julia is a dynamic language, which does type inference and compilation as an optimization while taking care to preserve dynamic semantics. The issue raised in the OP is about an expression evaluated in global scope, so it isnāt compiled anyway, but even inside compiled functions, a binding with a non-concrete inferred type will result in a dynamic load, often with knock-on effects like dynamic dispatch and type instability, but preserving the map from {argument values, global state, data from io, whatever else} to return value.
julia> f() = typeof(x);
julia> x = 1.0;
julia> f()
Float64
julia> x = [1];
julia> f()
Vector{Int64} (alias for Array{Int64, 1})
Iām not an expert on any of this, so please educate me if Iāve misunderstood something, but what youāre claiming here goes directly against what Iāve learned about the language.
Okay, I wrote that comment, but that caused me to sort of persuade myself the other direction. If what I wrote is true, then why would having values in the vectors result in a narrower eltype for the result vector?
We still get the same leaky type-instability across world ages I was talking aboutā¦
Dynamic and compile are not opposites. Dynamic and static typing are opposites. Interpreted and compiled are opposites. Julia is a (usually) just-ahead-of-time compiled language ā which uses dynamic typing and tries to smartly tack down static types, where it can, to improve performance.
Julia is not fully dynamic like LISP. You canāt redefine const globals, you canāt redefine types (at the moment at leastā¦), you canāt redefine a function without returning to top level or calling invokelatest
. This is because the core devs have made tradeoff decisions about reducing dynamism so the compiler can improve its inference.
So this is another case where we have to choose between dynamic and static. Iām saying it makes sense to say āwe want the return type of the comprehension to be static, so if you try to run a comprehension over non-const globals, you get Vector{Any}
back.ā But that argument would also cut against returning a narrow eltype when you have values in the input vectors.
I like to think that Julia transcends/generalizes the old static vs dynamic dichotomy.
This reasoning is flawed. The expression, the array comprehension, is lowered to create a Base.Generator
with an anonymous function. Use Meta.@lower
to see for yourself. Some compilation may be occurring.
I know. I was responding to the claim that Julia must behave statically when referencing globals because it is compiled. The claim that compilation would constrain the semantics like that wasnāt mine. But based on your last two comments I think we agree that the OP exemplifies an incidental leak from type inference to observed behavior, rather than a consistent design where behavior depends on the types of bindings rather than values?
Fair point. However, in this case, it looks like at least one of the global variables is passed as an argument to the Generator
from global scope rather than being dereferenced within a compiled unit, which was the relevant distinction. Caveat that Iām not well versed in reading lowered code, but Meta.@lower
prints the following (truncated; the point is that x
is referenced at the top level of the thunk). Please let me know if Iām misinterpreting this.
:($(Expr(:thunk, CodeInfo(
[...]
ā %13 = Base.Generator(%12, x)
ā %14 = Base.Flatten(%13)
ā %15 = Base.collect(%14)
āāā return %15
I think where Iād push back is that there are lots of places where Julia is explicitly not dynamic (my examples above), so this isnāt an issue so much of Juliaās fidelity to its dynamic semantics as it is an issue of a pragmatic design choice:
Should a Generator
produce a container with a narrowed eltype when the inputs arenāt statically known?
I donāt take issue with Julia providing the narrowest type inference it can even if the behavior changes in different contexts, but it seems counter intuitive to me that f() = [xi*yi for xi in x for yi in y]
would ever return a Vector{Float}
since you canāt infer that from the input of f()
.
Curious to understand this position better. Is your point that you consider iterables with different eltypes fungible, i.e. that no one should depend on eltype
for behavior, so the language should be free to return as wide or narrow an eltype as is convenient (and to change this between versions without it being considered breaking)?
Because surely you donāt disagree with the following returning "Number"
?
f(::Number) = "Number"
f(::Any) = "Not a Number"
g() = f(x)
x = 2
g()
If itās not clear from my posts ā Iām sorta thinking through my position on this in real time. So I just want to say that I see the contradiction you are pointing out, and yet my gut still says consistently widening the eltype of a container in the face of type instability is the right call. But Iām not sure how to resolve that dissonance right now.
Iāll circle back if I get an a-ha
Semantically in Julia the return type of a function is undefined. So, there is no semantic way to reason about what the return type of map(f, Int[])
should be. The docstring of map
is quite vague and makes no mention of the behavior on empty collections, so it does not contradict the map
docstring for this
f(x::Int) = x/2
map(f, Int[])
to return Float64[]
. However, philosophically I donāt really like it. Since the output type of f
is unknown, the only reasonable empty vector to return is Any[]
.
To be clear, returning Any[]
is a choice that would be made by the implementer of map
, and that choice would need to be documented in the docstring. In other words, returning Any[]
would not be a result of type inference. The implementer could just as well choose to throw an error or return nothing
when the input collection is empty.
I realize that if map(f, Int[])
returned Any[]
, then map(::typeof(f), ::Vector{Int})
would be type unstable, but I am not talking about performance here. I am speaking only of what makes sense within the semantics of the language. (Functions do not have a known return type!)
Iām a bit confused.
julia> f(x::Int) = x/2
f (generic function with 1 method)
julia> Base.return_types(f)
1-element Vector{Any}:
Float64
Thatās an internal implementation detail. Base.return_types
is not a public function. Itās not a part of the semantics of the language.
Imagine youāre implementing mymap(f, x)
in a package. If you want to stick to the public API of Base Julia, then you canāt use Core.Compiler.return_type
or Base.return_types
. So you have no way to query the return type of an arbitrary function f
that is provided as the first argument to mymap
. Thus you have no way of knowing which T[]
to return when the second argument to mymap
is an empty collection.
In an ideal world, Base Julia would be written using only intrinsics, but instead weāve got things like Core.Compiler.return_type
being used in various places.
I also have the slight feeling that the ātype instabilityā in the title is leading us astray here. I now understand that Julia globals have static type Any
and the compiler refuses to specialize on this when you type anything into the repl. (I will return to this later.)
But to me, there is still a strange inconsistency in terms of semantics. Again consider the following two statements:
julia> vec(map(((xi,yi),) -> xi*yi, Iterators.product(x, y)))
julia> [xi * yi for xi in x for yi in y]
For any actual any global x
, y
of actual type Vector{T}
, the first one will always return Vector{T}
while the second will return a Vector{T}
if both lists are non-empty, and Vector{Any}
otherwise.
Which, in my book, is an, letās call it inconsistency ā and an unneccessary one, since map
is obviously able to figure out the return type by inspecting the eltype
of the argument and the return_type
of the function. So list comprehension not doing the same thing seems at least very strange (and I am not sure that an āoh but everything goes out the window once you do globalsā argument saves us here).
Edited to add: thanks though for all the comprehensive answers!
To be precise, non-const
global variables do have a concrete type at all times ā they just might not be constant, so the compiler doesnāt bother to specialize. The docs say:
The value of an untyped global variable might change at any point, possibly leading to a change of its type. This makes it difficult for the compiler to optimize code using global variables.
Performance Tips Ā· The Julia Language
Adding to your observation:
But to me, there is still a strange inconsistency in terms of semantics. Again consider the following two statements:
julia> vec(map(((xi,yi),) -> xi*yi, Iterators.product(x, y))) Float64[] julia> [xi * yi for xi in x for yi in y] Any[]
ā¦itās interesting to consider this example too:
julia> vec([xi * yi for xi in x, yi in y]) # a single 2D for loop
Float64[]
This has the same result as the version with map
. The for
loop in the last example is syntactic sugar for a Iterators.product(x, y)
, as opposed to two nested iterators.
The inconsistency in these results appear to be an example of to how Juliaās type inference can be lazy. Unfortunately, it isnāt an oracle, but will give up if the code is too complex or dynamic. We have found that nested array comprehensions ([for ā¦ for ā¦]
) arenāt type inferred as well as map
or n-d array comprehensions ([for ā¦, ā¦]
).
Could you expand on that idea a bit? Isnāt everything intrinsics at the bottom, including return_type
?
Alright, I did some mulling. Here are my thoughts:
- Where Julia has a real object to look at, it respects dynamic semantics. This is especially true for multiple dispatch which always dispatches on the real, concrete types of the arguments of a call. So for your example with Number/ Not a Number, we have to have dynamic dispatch on the value of
x
at runtime. - In the case of an empty container, there are no real values to look at, so we fallback to the Type System to try to infer the best
eltype
. When we are talking about non-const
globals, the type system canāt help us, so we getVector{Any}
. Which is fine becauseā¦ - The promise of a containerās
eltype
is not that it is the narrowest type that is a supertype of the elements, but that it is the supertype of any object that can be stored in the container. - Following from 3, there is nothing incorrect about changing the behavior of
map(identity, 1:5) = Any[1,2,3,4,5]
. This isnāt a dispatch question, and the runtime objects (1ā¦5) fit into the resulting vectorā¦ but of course, this tanks performance unnecessarily. And since we can statically infer that the elements areInt
s, it makes total sense to narrow theeltype
toInt
.
But then, we have this question of what should be the behavior here:
y = [1,2,3]
x = [1.0]
f() = [xi*yi for xi in x for yi in y]
f() # returns Vector{Float64}
The thought Iām trying to process is this: Right now, the return type of f
is dependent on the runtime type of x
and y
. So any use of f
is type unstable.
function g()
v = f()::???
v .+ 5
end
But if the compiler took the fact that the type of x
and y
werenāt constant and instead made the eltype
of the resulting vector Any
, then the return type of f
is now type stable.
Of course, this now passes the buck of type instability to accessing the elements of that Vector
. So Iām not saying this behavior is better, but I am saying it is a different design choice which is consistent with the promises of Julia. Nothing about the philosophy of Julia precludes widening the eltype
of a container so that the compiled code can be correct overtime without calling out to Compiler
functions at runtime.
Edit: So all this rambling to say: for any higher-level function that produces a container, Julia could always return Any
for the eltype
. From that base behavior, we can improve performance by narrowing that eltype
. When and how that narrowing happens (with runtime types, inference, etc) has tradeoffs that need to be considered. But I donāt see a real problem with having different behaviors when the compiler has a different information about the data.
Edit2: Just spitballing at this point, but given the fact that we have a distinction between Vector
and Memory
now, you could theoretically have a Vector{Any}
that points to a Memory{Float64}
. And add a new function called resolve_eltype
which narrows the Vector
ās eltype to the inner Memory
at the moment where you actually want to deal with the type instability, and it wouldnāt require allocating a whole new Memory
.
It appears to me that some of the disagreement is whether a comprehension should act as function barrier. This goes back to the original example (which I will modify slightly here to wrap both in functions):
julia> x = [1, 2];
julia> y = Float64[];
julia> f() = [xi * yi for xi in x for yi in y]; f()
Any[]
julia> g(x,y) = [xi * yi for xi in x for yi in y]; g(x,y)
Float64[]
If the comprehension looked at all of its captures and passed them via a closure (inducing a function barrier), then the result should be type inferrable. However, it appears that it does not do this and it represents a mere expression in its scope which means untyped variables make it impossible to reason about its return type (although it may dynamically discover a return type by explicit evaluation, when elements are available to evaluate).
Note that a let
can restore type inferrability by making typed captures of the global variables:
julia> h() = let x=x,y=y; [xi * yi for xi in x for yi in y]; end; h()
Float64[]
Note that even when elements are available, the result is not inferrable with untyped globals (hence I use the word ādiscoverableā above):
julia> [rand(Bool) ? x : y for _ in 1:3]
3-element Vector{Vector{Int64}}:
[1, 2]
[1, 2]
[1, 2]
julia> [rand(Bool) ? x : y for _ in 1:3]
3-element Vector{Vector}:
[1, 2]
Float64[]
Float64[]
Notice that the return types are different here, even though the result ācouldā be inferred to the correct Vector{Union{Vector{Int},Vector{Float64}}}
if type information were available. Instead, it is merely post-facto ādiscoveredā by inspecting the results.