Julia typesystem

Here is a demonstration of some behavior of the Julia typesystem.

function mine(x::Vector{Union{Int64, T}}) where T
    println(typeof(x))
    println(T)
end
julia> mine(Vector{Union{Int64, String}}([1, 2, 3, "hello"]))
Vector{Union{Int64, String}}
String

julia> mine(([1, 2, 3, "hello"]))
Vector{Any}
Any

In the second case, the type is not Vector{Union{Int64, String}}. It is also not Vector{Union{Int64, Any}} which might be a sensible next guess for something more general.

Instead, the type is Vector{Any}.

I have noticed that Vectors (and presumably other types as well) tend to like to default to T=Any, unless their arguments match on the type exactly.

To re-phrase the question another way, it seems the core thing to notice here is that the expression [1, 2, 3, "hello"] produces a Vector{Any}.

  • Q1: Is that by design?
  • Q2: Is there a particular reason for this?

The alternative would be for this to evaluate to Vector{Union{Int64, String}}.

I think I am correct in stating that if it were the latter, then this would be advantageous for the multiple-dispatch system, for two reasons:

  • Precision. T=Union{Int64, String} more precisely defines what is contained within the vector. By contrast, T=Any is a less precise statement about what is contained within the vector. It is a more accurate, or correct, statement about what the vector contains. If you were asked in a court of law, this would be a better thing to say.
  • Specificity. (Meaning less general.) If two functions are defined which take arguments of type Vector{Union{Int64, String}} and Vector{Any}, then we have lost the ability to dispatch on the more specific version. (Which is for T=Union{Int64, String}.

One final thing to note. I believe I am correct in thinking that Vector{Any} is not a supertype of Vector{Union{Int64, String}} despite the fact that Any is a supertype of Union{Int64, String}. This is because the types must be organized into a tree structure, which does not have enough flexibility to fully describe whether one type is compatible with another rather than just being a subtype of another.

Also note that Any is a supertype of Vector{<anything>}, but the Julia type system stops at the level of Vector{Any}. It doesn’t “promote” this type all the way to Any. (Is it obvious what I mean here?)

Any <: Vector{Any} <: Vector{Union{...}}

Please do correct me if I am wrong about this.

Not an expert but I can say some things for sure.

  1. Vector{Int64} and Vector{Any} could both hold only ints, the type is associated with the array rather than its contents (hope this point is clear).
  2. The specialization you were hoping for is possible in theory but,
  3. Vector{Any} is a safe bet and easy to parse.
  4. UnionTypes took a while to get really fast, then once they were fast, changing this would be breaking.
  5. Push!-ing to a messy union vector with somehting from outside the type union is wont work, and that kind of types-getting-in-the-way is somethign that the language design tries to avoid if it doesnt sacrifice speed. Th union type wont be faster for anyway because Strings are themselves mutable and not isbits.
  6. if you want this specialization you can annotate your contructor as you show here.
    Hope this helps. Its a practical rather than pure design choice.

There are many types which would fit here. E.g. Vector{Union{Int, String}} or Vector{Union{Integer, AbstractString}}, Vector{Union{Number, String}} etc. etc. One has to choose something.

So, for literal vectors the choice is a bit arbitrary. E.g. [1, 1.0] is a Vector{Float64}, whereas [1, 1.0f0] is a Vector{Float32}, not Vector{Union{Int, Float32}}. They could also have been Vector{Real}, since typejoin(Int, Float64) == Real.

And [1, 2, (3, (4, "a"))] could be Vector{Union{Int, Tuple{Int, Tuple{Int, String}}}}, where any of the Int could be replaced by Signed, Integer, Real, Number, or Any, in any combination.

This is a very conscious decision. Type parameters are invariant, not covariant, i.e. A{S} is not a subtype of A{T} even if S is a subtype of T. The exception to this is Union and Tuple. I.e. Union{String, Int} <: Union{AbstractString, Integer}, and also, Tuple{String, Int} <: Tuple{AbstractString, Integer}.

The “compatibility” between types is only partially in the type system, that’s true. The rest of it is taken care of with explicitly defined conversions and promotions. This is anyway needed for user defined types, and the “built in” types uses the same mechanisms. E.g. the compatibility between 1.0, 1f0 and 1 is not hard coded in julia, it’s defined with explicit promotions and conversions in the Base and Core modules.

I’m sorry, I didn’t understand this. If anything I would have thought it was the other way around?

This makes sense, but also

julia> vtest = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> push!(vtest, 4)
4-element Vector{Int64}:
 1
 2
 3
 4

julia> push!(vtest, "hello")
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Int64
The function `convert` exists, but no method is defined for this combination of argument types.

julia> vtest2::Vector{Any} = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> push!(vtest2, "hello")
4-element Vector{Any}:
 1
 2
 3
  "hello"

julia> vtest3::Vector{Union{Int64, String}} = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> push!(vtest3, "hello")
4-element Vector{Union{Int64, String}}:
 1
 2
 3
  "hello"

It works, but it’s always slow, so to be avoided if you can.

Note:

julia> vtest2::Vector{Any} = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> typeof(ans)
Vector{Int64} (alias for Array{Int64, 1})

So the type isn’t really what you might think, and specifying ::Vector{Any} both misleading and simply not needed, so simply drop it.

Ok that’s interesting and useful to know. So the typesystem itself does not naturally describe who is a subtype or supertype of what, or perhaps more accurately whether one type can become (be interpreted as / promoted to / converted to?) another type or not? There are in a sense, some arbitrary rules in some cases?

Note that vtest3 here is a Vector{Union{Int64, String}}. The result of an assignment is always the right hand side, that’s why you see Vector{Int64} here.

A follow-up question to this.

Does calling push!(vtest3, "hello") cause several memory copy operations?

I am thinking that given what you said, probably the Vector{Int64} has to be re-organized in memory as a Vector{pointer_to{Union{Int64, String}} before push!(vtest3, "hello") can complete?

Am I right about this?

In other words, initially in memory it is a Vector{Int64}, and the data it contains then needs to be copied, modified to allow the string to be appended?

The type system describes subtype relations. But that’s the only thing it describes. I.e. Int and Float64 are both subtypes of Number and of Real. But the type system does not describe that they can be added or multiplied. If you try the command @less 1 * 1.0 you’ll see the method definition:

*(x::Number, y::Number) = *(promote(x,y)...)

which says that x and y must be promoted (to the same type), in this case to Float64. These can be multiplied with the intrinsic Core.Intrinsics.mul_float. (This you can see with @less 1.0 * 1.0). This is not directly part of the type system, it uses type dispatch with the promote_rule and convert mechanics in base.

The reorganisation happens already at the assignment to vtest3, not with push!.

2 Likes

Ah! That’s how it works. Makes sense.

True, but this is the multiple dispatch mechanism. My initial thoughts were really focused on types which take template parameters.

The coersion (for lack of a better word?) from Vector{Union{Int64, T}} to Vector{Any} isn’t a result of a function call. It is (afaik) the result of some rule built into the AST interpreter / typesystem.

It’s the convert mechanics:

julia> convert(Vector{Union{Int, String}}, [1,2,"foo"])
3-element Vector{Union{Int64, String}}:
 1
 2
  "foo"

A @less convert(Vector{Union{Int, String}}, [1,2,"foo"]) reveals the method definition:

convert(::Type{T}, a::AbstractArray) where {T<:Array} = a isa T ? a : T(a)::T

A convert is inserted when you assign to a variable with a constrained type (like your vtest3).

But, the type of [1,2,"foo"] is built in somewhere, I guess. Array literals is a bit special, precisely because they are literal and need to be present to some degree before the types are created when julia is bootstrapped.

Edit: No, they use Base.vect which uses the promote_typeof mechanics. See ?[.

julia> Base.promote_typeof(1,2,3,"foo")
Any

So, it can be changed. You can do that for your own types.

Ok this helps to explain a bit of my earlier confusion.

julia> vtest4::Vector{Union{Int64, String}} = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> vtest4
3-element Vector{Union{Int64, String}}:
 1
 2
 3

The printed type from the first prompt is Vector{Int64}. That’s the type for [1, 2, 3].

Whereas in the second prompt, the type is Vector{Union{Int64, String}} which is the type of vtest4.

I’m still a bit confused about this. So there’s an implicit call to convert every time the type on the left hand side of = does not exactly match the type on the right hand side of =?

btw… how did you obtain this information? This is from my REPL:

help?> [
search: [ []

  []

  Square braces are used for indexing, indexed assignment, array literals, and array comprehensions.

Yes, if you have variables with declared types.
Conversion and Promotion · The Julia Language?

The thing here is that the output is actually Vector{Union{Int64, Any}} – because this union is actually equivalent to just Any:

julia> Union{Int64, Any}
Any

I am no expert, but it seems this is done to eliminate the (unneeded here) union splitting.

Ah, I was running the nightly version of julia. Some docstrings are better.

help?> [
search: [ []

  []

  Square brackets are used for indexing (getindex), indexed assignment
  (setindex!), array literals (Base.vect), array concatenation (vcat, hcat,
  hvcat, hvncat), and array comprehensions (collect).

Ah yes, good spot.

Is there a way to demonstrate this in the REPL?

To be more specific, is there a way to demonstrate that the output of [1, 2, 3, "hello"] is Union{Int64, Any}? I think this is what you are saying here?

It doesn’t seem right - why would the String type be promoted to Any?

Or really, why would the type parameter T be inferred to be Any rather than String? Isn’t that surprising if it is the case?

What I meant is they are literally the same types.

julia> Vector{Union{Int64, Any}}
Vector{Any} (alias for Array{Any, 1})

I guess unions are just not produced by this [...] notation. The most sensible behavior here is trying to promote to some general type, and just returning Any if that’s not possible. The same reason [false, 1, 2.0] will be Vector{Float64} and not a over-complicated union.