# Union of Unions

#1

Is there a simple way to construct Union{A,B,C} from Union{A,B} and Union{B,C}? My initial guess was typejoin, but:

AB = Union{Val{:A}, Val{:B}}
BC = Union{Val{:B}, Val{:C}}
typejoin(AB,BC) == Val

Which I guess is correct, i.e. it is an upper bound. I want the supremum though in this case.

Motivating example: I want to, at compile time, generate the most efficient code to get all the requested work done. Assume there are two jobs: one with job requirements Union{A,B} and one with job requirements Union{B,C}. Assume I have a function dojob with three methods:

dojob(::Type{Union{A,B}}) = (doa(); dob())
dojob(::Type{Union{B,C}}) = (dob(); doc())
dojob(::Type{Union{A,B,C}}) = (doa(); dob(); doc())

In those situations where I need computations A,B, and C done it is clearly more efficient to call the third method than to call the first two in succession (there is some overlap). In order to make sure that this third method is selected at compile time I need to be able to compute the union of the Union types.

For now I have implemented something resembling this functionality using types like Val{:ab}, Val{:bc}, … and piggybacking on the promotion infrastructure:

Base.promote_rule(::Type{Val{:ab}}, ::Type{Val{:abc}}) = Val{:abc}

This is unfortunately not scalable and I suspect the Julia type system and in particular the Union type can give me this for free…

#2
julia> T = [Float64, Int, Int32];

julia> U = Union{T[1], T[2]}
Union{Float64,Int64}

julia> Union{U, T[3]}
Union{Float64,Int32,Int64}

Why do you want to do this, though?

#3

Thank you. I suspected the resulting type to be Union{Union{Float64, Int}, Int32}, whatever that means… Make sense for it to be flattened in this context now I think about it.

As hinted towards in my post I would use this to generate dedicated code that can complete a number of posted jobs with a minimum of overlap. More in particular I will use this to have the compiler build a quadrature kernel for a boundary element method that will compute overlap integrals for all operators required in the formulation of the problem (typically all finite element space are defined on the same triangulation, so only the integration kernel needs to be bespoke). Because different kernels share a lot of the same computations (e.g. evaluations of complex exponentials) and because there kernels are the inner part up to four for loops, a significant speed-up can be realised by having these kernels be generated JIT.

In fact, I believe that this is an area where one can realistically argue that Julia can be faster than Fortran or C, since in these languages such techniques would require all kinds of preprocessor and build system tricks. In C++ the same could be done with templates I guess, but much less readable I’m sure.

Actually now I have a follow-up question: I have two implementations (performing subtaks AB, CB, respectively):

A = Type{Val{:A}}
B = Type{Val{:B}}
C = Type{Val{:C}}

f(::Union{A,B}) = (doa(); dob())
f(::Union{B,C}) = (dob(); doc())

Question1: If I am only interested in having subtask B done, I would call f(Val{:B}). This results in an ambiguity error (as it should). Is there a way for me to tell the compiler to pick one and not to worry about alternative implementaions?

Question2: Is there an easy way to generate the function:

function f(::Union{A,B,C})
f(Val{:A})
f(Val{:C})
end

I know this will involve some overlap, but it would be good to have these fallback implementations generated automatically. I can always add a bespoke implementation performing all subtasks at once at a later time.

#4

f{T}(::Type{Val{T}}) = println("I am doing \$T")
f{T <: Tuple}(::Type{T}) = for p in T.parameters; f(Val{p}) end
f{S <: Tuple,T <: Tuple}(::Type{T}, ::Type{S}) = f(Tuple{union(Set(S.parameters),
Set(T.parameters))...})

where

julia>  f(Val{:A})
I am doing A

julia>  f(Tuple{:A,:B,:C})
I am doing A
I am doing B
I am doing C

julia>  f(Tuple{:A,:B}, Tuple{:B,:C})   # order non-deterministic, use sort
I am doing C
I am doing B
I am doing A

#5

That is definitely getting very close to what I have in mind!

I assume in your implementation the for loop is executed at runtime? Probably can be fixed by using generated functions? If so this seems to do the trick.

Many thanks!