Unpack types in Union

I would like to extract a Vector{DataType} from a Union. I found the solution shown below, but it looks weird to me, and non-idiomatic.
Is there a better way to unpack types in a Union?

My solution:

unpack_union(tt::Core.TypeofBottom) = DataType[]
unpack_union(tt::DataType) = [tt]
unpack_union(tt::Union) = 
    [getfield(tt, :a), unpack_union(getfield(tt, :b))...]

providing the following results:

julia> unpack_union(Union{})
DataType[]

julia> unpack_union(Union{Int})
1-element Vector{DataType}:
 Int64

julia> unpack_union(Union{Int, String})
2-element Vector{DataType}:
 Int64
 String

julia> unpack_union(Union{Int, String, Missing})
3-element Vector{DataType}:
 Missing
 Int64
 String

and why is that you want to do this? making sure it’s not an XY problem…

Sorry, don’t know what is a XY problem…
Still, I think the question worth an answer (even if my problem could be solved differently…).

there are a lot of code smell you can do, doesn’t mean you should

Thank you, I didn’t knew it!

Indeed, I don’t want to. That’s why I asked what is the correct way to do it…

that’s why I asked what do you need this for?

I would like to keep the discussion focus on the topic, so forget about my proposed solution (which I really don’t like…) and think of it as simply:

Is there an idiomatic way to unpack types in a Union ?

Even “no” is an acceptable answer :wink:

the answer is you shouldn’t be doing that, so if it turns out you really need to, please show us the use case so people can help you.

OK, I can live with it. It’s not clear to me why such introspection should be discouraged or forbidden, given all the amazing Julia features (think about macros…)
But this is definitely not the point I’m interested in.

Thank you for your answer.

doesn’t look that horrible to me :slight_smile:
I would maybe hide everything behind another function:

function unpack_union(tt::Union)
    _unpack_union(tt::Core.TypeofBottom) = DataType[]
    _unpack_union(tt::DataType) = [tt]
    _unpack_union(tt::Union) = [getfield(tt, :a), _unpack_union(getfield(tt, :b))...]
    _unpack_union(tt)
end

Well, consider the following:

  • a Union is definitely not a structure, still you can use it in getfield;
  • propertynames() always returns two names (a and b), the second being a nested Union if more than two types ar involved;
  • the property names a and b to access the types in a Union are not documented in the manual (AFAIK).

Anyway good to know that it doesn’t sound horrible!

1 Like

I mean, you’re definitely depending upon internals — Unions can behave surprisingly and sometimes disappear entirely, depending on how they’re being generated. Defining dispatch on ::Union can be tricky (as you’ve found with TypeofBottom and such). You’re definitely well outside of what inference can possibly track, but that’s not necessarily a bad thing.

I’d try to reformulate your problem such that you don’t need to do this, if at all possible.

2 Likes

Sometimes a data-reading library returns arrays of Union{Missing, Float64} or Union{Missing, Float32}. Because some other libraries (packages) I use doesn’t handle missing, I sometimes need to convert the missing values to NaNs.

So, how do you determine the second element of the Union?

function readalldata()
  a = readdata()
  b = replace(a, missing=>eltype_of_data(NaN)) # how to determine the type?
  return b
end

Because in my case there are only two possibilities, I can branch like

eltorg = eltype(a)
elt = if eltorg == Union{Missing,Float64}
        Float64
      elseif eltorg == Union{Missing,Float32}
        Float32
      else
        error("unknown type: $(eltorg)")
      end
b = replace(a, missing => elt(NaN))

Inellegant, but manageable.

Also, I know that the right approach is to ask the package writers to support missing . . .

How about nonmissingtype?

Friendly suggestion that it would probably better to create a new thread, and perhaps linking to this one, instead of posting in one that’s been inactive for over 3 years :slight_smile:

2 Likes

Thanks! But, it’s curious how the function is implemented. See below.

That depends on the subject of the new thread you are suggesting. If you suggest starting a thread about getting the other part of Union{Missing, Sometype}, then nonmissingtype is the solution, but this thread is more general: How to deconstruct a Union, which hasn’t gotten a clean and idiomatic answer.

I just continued this thread to provide an example where such a functionality is useful, because the original poster was blamed of not providing a use case.

So, to continue, nonmissingtype is implemented like this

nonmissingtype(::Type{T}) where {T} = typesplit(T, Missing)

But, I’m not able to find how typesplit() is defined. (I don’t know github well enough.) I tested it a bit and found that it works on any Union. It acts like subtracting a type from the Union:

Base.typesplit(Union{S,T,U}, T) == Union{S,U}

Inside the function there must be an iteration to go over S, T, and U one by one. Does this iteration uses the :a and :b trick discussed above?

julia> methods(Base.typesplit)
# 1 method for generic function "typesplit" from Base:
 [1] typesplit(a, b)
     @ promotion.jl:147
1 Like
4 Likes

@nsajko Very nice! Thank you. So, typesplit() does use the :a and :b internals and public API accessing Union components is being considered.

That’s a proper continuation of this thread.

1 Like

We really do need a way to look at each member of union type that is stable. A good amount of packages end up touching stuff related to UnionAll and Union that is internal API.

I think one of the big issues in the way of progress here is the conflict between what a Union is conceptually and the reality of its implementation. Although the implementation of Union is in of itself a collection, it is conceptually a type whose subtypes my be non-finite. This doesn’t mean we can’t do anything with types that subtype an unknown quantity of other types. After all we have methods like typejoin and typeintersect that work on Union. However, defining something that is conceptually robust and functionally useful can get pretty difficult.

For example, let’s say you want typediff(lhs, rhs) -> ty so that !(ty <: rhs) && (ty <: lhs). The practical application may at first seem simple and useful (typediff(Union{Int, Char, Symbol}, Int) -> Union{Char, Symbol}), but what do you do for typediff(Union{Integer Char, Symbol}, Int)? You have to decide whether you want to create a new union including every subtype except Int or if there’s a meaningful heuristic where we can agree that typediff gives up.

Unless the goal is literally to convert between types, we usually care more about some trait related lhs than the type itself. In which case, you really do just want to check some trait on each member in a union. This might be okay if poking around inside of Union was alright to do, but I don’t think everyone has agreed that the current implementation is what we want forever.

1 Like
typediff(Union{Int, Char, Symbol}, Int) # -> Union{Char, Symbol}
typediff(Union{Integer Char, Symbol}, Int) # -> ???

I see!

Initially, I naïvely thought, why not use the AbstractSet API for Union ? But, now I realize that types has not only equality but also subtyping <: . So, it’s like a set of sets and <: is like ⊆.

So, it all depends how you would use, or what you want from, typediff(). I’m far from knowledgeable enough about the implementation of the Julia languge to go further than this point in a meaningful way. . . . my useless thought is, if we use the “set of sets” analogy, “diff” is “xor” and

{Integer, Char, Symbol} xor {Int} == {Integer, Char, Symbol, Int}

because here we use only equality to define “diff”, but I guess that’s not what you want in the implementation of the language.