Why is `[["x", "y"], [1]]` a `Vector{Vector}`?

I am trying to write a function similar to the following, where I want y to be a Vector of Vectors where each inner vector is either Vector{String} or Vector{Int}.

foo(x::SomeType, y)

I am unsure about which type to use for y though, because [["x", "y"], [1]] is of type Vector{Vector}. While [["x", "y"], ["z"]] is of type Vector{Vector{String}} which is not a subtype of Vector{Vector}.

I tried y::Vector{<:Union{Vector{String}, Vector{Int}} which does not work because although each element of [["x", "y"], [1]] is a subtype of Union{Vector{String}, Vector{Int} the entire vector is not a subtype of Vector{<:Union{Vector{String}, Vector{Int}}.

I have two questions:

  1. What type should I choose for y such that the function only accepts ys for which each element is a subtype of Union{Vector{String}, Vector{Int}?
  2. Why is [["x", "y"], [1]] of type Vector{Vector}} and not of type Vector{Union{Vector{String},Vector{Int64}}}?
1 Like
  1. You should use Vector{<:Union{Vector{String},Vector{Int}}}. Then, when instantiating your vector of mixed vectors, specify its type directly: Union{Vector{Int}, Vector{String}}[["x", "y"], [1]]. Also note that this includes Vector{Union{}}. You could instead do Union{Vector{Vector{Int}}, Vector{Vector{String}}, Vector{Union{String, Int}}} but that’s a little annoying, and accepting Vector{Vector{Union}} is probably okay.
  2. This is for practical reasons. The compiler could instead have it be a vector of a union, but then, what about a vector of 3 different types? Four? Five? Etc. At some point, the type of the vector would become enormous, which would slow down the compiler (and lead to completely inscrutable error messages and REPL outputs). So, the current approach is to instead pick the element type as the smallest non-union type which is a supertype of all the elements types.
2 Likes

Also, this becomes much easier if you create a name for this union type:

julia> const SIVector = Union{Vector{String}, Vector{Int}}
Union{Vector{Int64}, Vector{String}}

julia> v = SIVector[[1,2], ["a","b"]]
2-element Vector{Union{Vector{Int64}, Vector{String}}}:
 [1, 2]
 ["a", "b"]

julia> f(y::Vector{<:SIVector}) = length(y)
f (generic function with 1 method)

julia> f(v)
2
3 Likes

A solution could be to create 2 different methods and from one of them call the other, this should make the code more readable and easy to maintain.

The explanation makes a lot of sense, thanks you.

Regarding the solution, I don’t think I can go for this, since it’s a user facing function and it sounds like too much of a complication for a user. I also cannot specialise the functions, since the user should be allowed to specify things either as strings or as integers and these might be mixed in the function call.

I was thinking about simply choosing y::Vector{Vector} and do the type check internally in the function but that doesn’t sound very Julia like to me.

I thought about my problem again, and the foo function basically just loops over the vectors in y and applies some other function. So one idea would be to allow foo to take y::Vector{Vector} and specialise the inner functions, along the lines of what @VinceNeede proposed. This would look like this:

_foo(x::Symbol, y::Vector{String})
_foo(x::Symbol, y::Vector{Int})
function foo(x::Symbol, y::Vector{Vector})
    z = map(yy -> _foo(x, yy), y)
    return z
end

One issue that this might cause is that foo blocks all functions of signature foo(::Symbol, ::Vector{Vector} although the function cannot actually handle all those types. Is this accepted in the general Julia ecosystem?

Another problem is that users that provide the wrong types would get an error message that no appropriate _foo function exists, rather than foo, which might be confusing to the user. Is there a way to improve this error message for the user?

Duck typing is certainly an accepted programming style, so I wouldn’t worry about that aspect. Quoting the style guide:

In fact, in many cases you can omit the argument type altogether, unless it is needed to disambiguate from other method definitions, since a MethodError will be thrown anyway if a type is passed that does not support any of the requisite operations. (This is known as duck typing.)

But as you point out in your opening post !([["x", "y"]] isa Vector{Vector}), so you should choose a more appropriate type (cf. jakobnissen’s post, or something more generic like an AbstractVector{<:AbstractVector}), or just leave out the type entirely and rely solely on duck typing (for foo, not for _foo).

I think the best choice is to leave the code as it is and maybe specify in the docstring which types are actually supported. The MethodError with the stack trace of foo that calls _foo should already be informative enough.

Using Vector{Vector} should not give problems in dispatch since it is a UnionAll, so all types Vector{T} would be accepted by the function, which should be Vector{Vector{Any}} for [["x","y"],[1]]

Thanks everyone. I’ll go with the duck typing solution.

The literals do not want to overly specify a type.
Imagine you had a literal that’s like…

B = [1.0, "1.0", 1]

You could make it a Vector{Union{Float64, Int64, String}}, but then it could turn into a union nightmare (think if there are not just 3 but 5 or 10 types here), which is even slower than if you had just specified it as a Vector of any.
So, if you really want only a vector of certain types, you have to be explicit about it. Otherwise, the compiler won’t be going over every type of your literal and make a union of several types, which doesn’t really help performance-wise.