Narrowing the type of a nested array

I am collecting data in an array that I don’t know the types of beforehand.

There is a reasonable chance they will be similar, so I was wondering what would be a decent way to narrow the type of the final array. I saw a suggestion of using identity which does well when all elements are of the same type, but with more nested dissimilarities it is semi-concrete:

x = [1,2,3]
y = [1.,2.,3.]

julia> identity.(Any[x, x])
2-element Vector{Vector{Int64}}:
 [1, 2, 3]
 [1, 2, 3]

julia> identity.(Any[x, y])
2-element Vector{Vector}:
 [1, 2, 3]
 [1.0, 2.0, 3.0]

When it could be:

julia> Vector{<:Real}[x,y]
2-element Vector{Vector{<:Real}}:
 [1, 2, 3]
 [1.0, 2.0, 3.0]

Do you think there is much to gain from finding the most narrow type? How would one go about it?
Or maybe the Vector{Vector} would suffice?

1 Like

A manual approach could be:

function refine_eltype(x::Array{T, N}) where {T, N}
    isempty(x) && return Array{Union{}, N}()
    A = first(x)::Array
    nd = ndims(A)
    E = eltype(A)
    for i in 2:length(x)
        A = x[i]::Array
        ndims(A) == nd || error("Must match dimensionality")
        E = typejoin(E, eltype(A))
        E === Any && break
    end
    Array{Array{<:E, nd}, N}(x)
end

This is not generic over multiple dimensionalities or AbstractArray types.

The following is perhaps not the most decent way. But just in case:

julia> Vector{<:typejoin(eltype.(Any[x,y])...)}[x, y]
2-element Vector{Vector{<:Real}}:
 [1, 2, 3]
 [1.0, 2.0, 3.0]

Perhaps another question is: Why do you care about this? From a performance perspective, I think there is no difference between Vector{Vector} and Vector{Vector{<:Real}} because there is no reasonable restriction on how a subtype of Real might look like.
If its performance you want, then you should think about promoteing your data to a common type and work with that. If this cannot be achieved/is undesired then using an Array is probably not the right datastructure.

If this is not about performance, then I am curious about your motivation for this question.

2 Likes

Thanks for all the suggestions! I think I would have to implement a quite generic solution as the arrays might be arbitrarily nested, so probably its best to leave it at the surface level if add at all.

@abraemer Right, I was not sure if there would a performance penalty for leaving the type even semi-abstract, so wanted to ask. My use case is reading data from a file format - elements appear sequentially, so you can’t know up front if they share a common signature (but they might, so I just want to recover it, if that is the case). Just a habit of avoiding Any if possible, I guess.

Pros:

  1. You reach a concrete type, and now the compiler can really start optimizing in general.
  2. A narrower type could narrow down possible methods to brittle type inference improvements. Usually this doesn’t pan out and it’s not different from the broadest Vector{Any}.

Cons:

  • You can push fewer array types afterward.
  • Your element type depends on your runtime data, which you’ll have to compile more for.
  • The runtime dispatches may not be worth optimizing away if the methods they dispatch to take much longer to run, and subsequent type instabilities can be handled with type conversions and function barriers.