I am collecting data in an array that I don’t know the types of beforehand.
There is a reasonable chance they will be similar, so I was wondering what would be a decent way to narrow the type of the final array. I saw a suggestion of using identity
which does well when all elements are of the same type, but with more nested dissimilarities it is semi-concrete:
x = [1,2,3]
y = [1.,2.,3.]
julia> identity.(Any[x, x])
2-element Vector{Vector{Int64}}:
[1, 2, 3]
[1, 2, 3]
julia> identity.(Any[x, y])
2-element Vector{Vector}:
[1, 2, 3]
[1.0, 2.0, 3.0]
When it could be:
julia> Vector{<:Real}[x,y]
2-element Vector{Vector{<:Real}}:
[1, 2, 3]
[1.0, 2.0, 3.0]
Do you think there is much to gain from finding the most narrow type? How would one go about it?
Or maybe the Vector{Vector}
would suffice?
1 Like
A manual approach could be:
function refine_eltype(x::Array{T, N}) where {T, N}
isempty(x) && return Array{Union{}, N}()
A = first(x)::Array
nd = ndims(A)
E = eltype(A)
for i in 2:length(x)
A = x[i]::Array
ndims(A) == nd || error("Must match dimensionality")
E = typejoin(E, eltype(A))
E === Any && break
end
Array{Array{<:E, nd}, N}(x)
end
This is not generic over multiple dimensionalities or AbstractArray types.
The following is perhaps not the most decent way. But just in case:
julia> Vector{<:typejoin(eltype.(Any[x,y])...)}[x, y]
2-element Vector{Vector{<:Real}}:
[1, 2, 3]
[1.0, 2.0, 3.0]
Perhaps another question is: Why do you care about this? From a performance perspective, I think there is no difference between Vector{Vector}
and Vector{Vector{<:Real}}
because there is no reasonable restriction on how a subtype of Real
might look like.
If its performance you want, then you should think about promote
ing your data to a common type and work with that. If this cannot be achieved/is undesired then using an Array is probably not the right datastructure.
If this is not about performance, then I am curious about your motivation for this question.
2 Likes
Thanks for all the suggestions! I think I would have to implement a quite generic solution as the arrays might be arbitrarily nested, so probably its best to leave it at the surface level if add at all.
@abraemer Right, I was not sure if there would a performance penalty for leaving the type even semi-abstract, so wanted to ask. My use case is reading data from a file format - elements appear sequentially, so you can’t know up front if they share a common signature (but they might, so I just want to recover it, if that is the case). Just a habit of avoiding Any
if possible, I guess.