Vector{Vector{T} where T} vs Vector{Vector{T}} where T

I can see a lot of discussions around UnionAll and new where keyword on GitHub, but most comments seem to make sense only to people who track the topic for a long time. Is there any docs or drafts (except for news) about it? In particular, I’m confused that this:

Vector{Vector{T} where T} 

is treated differently from:

Vector{Vector{T}} where T

Can somebody provide an example of what data would match the first and the second type definition?

1 Like

There is some discussion of this exact example in the section entitled UnionAll Types in the chapter of the Julia manual on Types, which is here:

http://docs.julialang.org/en/latest/manual/types.html#UnionAll-Types-1

– Steve Vavasis

Imagine the only types in Julia happened to be Int and String, so that T had to be one of those.

Then the first would be equivalent to Vector{Union{Vector{Int},Vector{String}}} and the second to Union{Vector{Vector{Int}}, Vector{Vector{String}}}.

5 Likes

Should this be Vector{Union{Vector{Int}, Vector{String}}}?

I still don’t see what data conforms to these data specification. For example, docs describe Vector{Vector{T} where T} as:

a 1-dimensional array of 1-dimensional arrays; each of the inner arrays consists of objects of the same type, but this type may vary from one inner array to the next.

For me it sounds like e.g.:

[String[], Int[]]

But isa() shows it’s not:

julia> isa([String[], Int[]], Array{Array{T,1} where T,1})
 false

Is that perhaps an error in the docs? I’d think since there is only one T there can only be one type.

Also isa([String[], Int[]], Vector{Vector{Union{String, Int}}}) is false. But that is because typeof([String[], Int[]]) is Vector{Vector{Any}}.

But I also puzzled by what @malmaud suggested

julia> isa(Vector{Vector{Union{String, Int}}}(), Vector{Vector{T} where T})
false

and @greg_plowman

julia> isa(Vector{Union{Vector{Int}, Vector{String}}}(), Vector{Vector{T} where T})
false

Are we on a wild goose chase here?

Yes, fixed. Thanks!

Remember the illustrative only assumption:

Imagine the only types in Julia happened to be Int and String, so that T had to be one of those.

1 Like

So in practice, it’s more like we have 3 types in our illustrative only type system - String, Int and Any, right? [String[], Int[]] has type Array{Array{Any,1},1} which still seems to be valid according to the definition in docs as well as @malmaud’s example. So why this:

isa(Vector{Vector{Any}}(), Vector{Vector{T} where T}) 

still gives false?

Instead of [String[], Int[]], you need Union{Vector{String}, Vector{Int}}[String[], Int[]] to explicitly type the kind of vector you want to create.

Julia implements the rule that array literals with different types should be constructed as Vector{Any}, rather than Vector{Union{elements in array literal}}, which would have been a valid choice but generally not what people would want.

I agree about array literals and thus used Vector{Vector{Any}}() in the last test, but perhaps I misunderstand union types in general. I played around a bit and here’s a simpler example that I don’t understand:

julia> Vector{String} <: Vector{Union{String, Int}}
false

I used to think that Vector{Union{T1, T2}} is the same as Union{Vector{T1}, Vector{T2}}, i.e. a set of all vectors of either T1 or T2. Apparently, these are 2 very different types. So what’s the meaning of Vector{Union{T1,T2}}? E.g. what’s an example value for x in:

isa(x, Vector{Union{String, Int}})

so that this expression returned true?

A Vector{Union{T1,T2}} is an array whose elements can be of either type T1 or T2. e.g. the first element can be T1, the second element can be T2, and so on. For example:

julia> Union{String,Int}["x", 2]
2-element Array{Union{Int64,String},1}:
  "x"
 2   

On the other hand, a Union{Vector{T1},Vector{T2}} is either a vector of all T1 elements or a vector of all T2 elements. The vector cannot have both T1 and T2 elements in it.

The memory representations of these are very different.

  • A Vector{Union{Int,String}} must be an array of “boxes” (or pointers thereto): type tags + data. That is, for every element there must be a type tag indicating the type of that element.

  • A Vector{Int} (which is a subtype of Union{Vector{Int},Vector{String}}), is very different: the type information is attached to the array as a whole. Since every element is of type Int, it can be (and is) stored in memory as simply one Int value after another (64 bits per element if Int == Int64).

Hence, a Vector{Int} cannot be used where a Vector{Union{Int,String}} is expected, because the memory representations are totally different. The ability to have a uniformly typed array like this, where the element data is “packed” inline without individual type tags, is crucial for performance.

In 0.6 (or in 0.5 with @compat), you can declare a function parameter as Vector{<:Union{Int,String}}, and it will allow Vector{Int}, Vector{String}, or Vector{Union{Int,String}}. (Up to three different versions of your function will be compiled, specialized for the three cases, depending on what argument types you pass.)

6 Likes

This is a very good explanation, thanks! Although I haven’t got used to all use cases of type unions, memory representation was exactly the clue I was missing.

This type “invariance” is also described quite nicely in the manual:

http://docs.julialang.org/en/latest/manual/types.html#Parametric-Composite-Types-1

Related, I wonder if the following two are effectively the same types: Vector{T} where T<:Union{Float64,Int64} and Union{Vector{Float64},Vector{Int64}}.

I know these two are different from Vector{Union{Float64,Int64}}, which is a concrete type that can take both Float64 and Int64 as vector entries. Therefore, this is not a supertype of Vector{Float64}, which can take only Float64 as entries:

julia> const T3 = Vector{Union{Float64,Int64}}
Array{Union{Float64, Int64},1}

julia> Vector{Float64}<:T3
false

On the other hand, the first two types are either a vector of Float64, or a vector of Int64. Therefore, they are both (abstract) supertypes of Vector{Float64}:

julia> const T1 = Vector{T} where T<:Union{Float64,Int64}
Array{T,1} where T<:Union{Float64, Int64}

julia> const T2 = Union{Vector{Float64},Vector{Int64}}
Union{Array{Float64,1}, Array{Int64,1}}

julia> Vector{Float64}<:T1
true

julia> Vector{Float64}<:T2
true

However, they are different types:

julia> T1 == T2
false

Are there any practical differences between T1 and T2? If there are, which one is more preferable in what situations? Or, are they effectively the same, so which one to use is just a matter of taste?

They are not, since there are many other types that matches Vector{T} where T<:Union{Float64,Int64} that doeesn’t match Union{Vector{Float64},Vector{Int64}}. e.g. Vector{Union{}}, Vector{Union{Float64,Int64}}.

1 Like

Interesting. I didn’t know that Union{} is a subtype of any union type. In general, for two concrete types T and S, what are the subtypes of Union{T1,T2}? Are T1, T2, Union{}, Union{T1,T2} the only ones, or there are something else?

Union{} is a subtype of any type.

I don’t know any other types that are subtype of Union{T1,T2} but I’m not sure…

2 Likes

Since this particular case is not explicitly answered (implied in the other comments/answers)
So let me try :slight_smile:

I think the starting point should be to accept the fact:

Real == (T where T <: Real)

The same holds true when we replace Real with any other type. so Any == (T where T <:Any), etc

The above relation means that type Real is the union of Real and its all subtypes (abstract or concrete)
that is, Real == Union{Real, Rational, Integer, String, etc}

We can confirm this fact with simple type hierarchy:

	let
		abstract type A end
		abstract type B<:A end
		struct C<:B end

		@assert A == Union{A, B, C}
		@assert B == Union{B, C}
	end

I feel this relation is weird because it means that something is equal to a set that contains it.
But once the rule is accepted, the rest can be explained based upon it without difficulty.

First, Vector{Any} vs Vector{<:Any}:
Vector{Any} is a short-hand notation for Vector{T where T<:Any} (note this is same to Vector{T where T} wihout <:Any suffix). Therefore Vector{Any} means a Vector whose element is of type Any. (note Any == (T where T<:Any)). So [1, "a"] and [1, 2] are all valid instance of Vector{Any}

On the other hand, Vector{<:Any} is a short-hand notation of Vector{T} where T<: Any ((note this is same to Vector{T} where T). Also, in REPL, Vector{<:Any}==( Vector{T} where T<: Any) is confirmed. Therefore Vector{<:Any} is equal to a collection of types {Vector{Any}, Vector{Number}, Vector{String}, ...} == Union{Vector{Any}, Vector{Number}, Vector{String}, ...}. That is, Vector{Any} is an element in the set Vector{<:Any} and it means Vector{Any} is a proper subtype of Vector{<:Any}. Also in REPL, we can confirm that Vector{Any}<:Vector{<:Any} && Vector{Any}!=Vector{<:Any} indeed.

Vector{Vector{Any}} vs Vector{Vector{<:Any}}:
Now, Vector{Any} is a proper subtype of Vector{Vector{<:Any}}
Due to type invariance property of Array in Julia, when T1 is a proper subtype of T2
Vector{T1} can not be a subtype of Vector{T2} Therefore Vector{Vector{Any}} can not be a subtype of Vector{Vector{<:Any}} This is confirmed in REPL by checking Vector{Vector{Any}} <: Vector{Vector{<:Any}} false.

This reasoning explains why isa(Vector{Vector{Any}}(), Vector{Vector{T} where T}) is false.

However without relying on the formality, I think we can understand the relation with just common sense. That is, I think a type is defined by what it is capable of.
So, the Vector in Vector{Vector{Any}} is capable of accepting an element only if the element is of type Vector{Any}. that is the Vector is “restrictive”

but the Vector in Vector{Vector{<:Any}} is capable of accepting an element as long as the type of the element is any one of Vector{Any}, Vector{Int}, Vector{String}, etc.
In other words, this Vector is “generous” (relative to the previous one)

If Vector{Vector{<:Any}} were a supertype of Vector{Vector{Any}}, it means Vector{Vector{Any}} is-a Vector{Vector{<:Any}} which then comes to imply a restrictive type is-a generous type, which sounds contradictory.

Another source of confusion to new comers, I think, is the fact that [1, 2] isa Vector{Any} is false although [1, 2] is a valid instance of Vector{Any}. In fact [1, 2] an instance of Vector{Any} and at the same time it is an instance of Vector{Int} but recall that Vector{Any} != Vector{Int} nor Vector{Int} <: Vector{Any} so of what type should [1,2] be? it is up to the compiler unless you specify as you wish it to be. The compiler always choose the least common ancestor of two element 1 and 2 which is Int64 so typeof([1,2])==Vector{Int64} which is not subtype of Vector{Any}
by type invariance. Therefore, [1, 2] isa Vector{Any} is false although [1, 2] is a valid instance of Vector{Any}

I hope my thought hasn’t brought further confusion to the discussion and also understand my not smooth English! If there is any error in my thought, please fix it!

2 Likes