I have two data structures, and some functions that have to operate on vectors of those data structures. One option is to define an abstract type as a supertype of both data structures. The other option is to create a new type which is the union of both types. A minimal example is:
abstract type C end
struct A <: C
a :: Int64
end
struct B <: C
a :: Int64
end
f( x :: Vector{<:C} ) = x[1].a
# or
AorB = Union{Vector{A},Vector{B}}
g(x :: AorB) = x[1].a
These things are NOT the same, because f accepts vectors with mixed A and B types, but g does not.
In my case, I will never have a vector of mixed types, thus it seems that the Union approach would allow more specialization than the abstract type approach. At the same time, I have tested and, when feeding the functions with vectors of only one of the types, the @code_typed is identical.
Is any of the approaches any better for any reason I cannot foresee?
Additionally, I noted that if I feed the function f with a mixed vector, the code generated does not specialize at all for the type of the a fields of A and B:
I would say that the Union approach has the advantage (that you already perceived) of checking and throwing an error if you ever mix the types in a vector.
If you call isconcretetype over AorB or Vector{<:C} you will get that neither of them is a concrete type. This means their code may be specialized for A or B (note however, that I do not remember if there is a guarantee of specialization). This is different from Vector{C} (which is a concrete type) and will only accept Vector{C} values (not Vector{A} or Vector{B}), which you should explicitly define (with C[] for example) otherwise you will be relying in the compiler inference working the way it works now ([A(1), B(2)] also gives a Vector{C} but I am not sure this is guaranteed to keep this way between releases). In this last case the method has always only one specialization (for the Vector{C} type) but the object has always some overhead (because it already stores boxed instances of A or B).
In other words, defining an upper layer of functions that specialize the methods by hand can guarantee specialization any further or the compiler will always figure this out, if that is possible?
Or, in other words: is it a waste of time writing these other methods?
Edit: thinking further, this must be the same that happens with the Union choice.
Good question. I am not 100% sure. I would expect that f(x :: Vector{A}) = ... guaranteed the specialization. This because Vector{A} is more specific than Any (the implicit type of x in f(x) = ...) and therefore the multiple dispatch guarantee the body f(x :: Vector{A}) = ... will be used. The manual section on function barriers also seem to imply the specialization. @nospecialize (and, consequently, @specialize) documentation use the term βhintsβ instead of βguaranteeβ. I think a Julia commiter would be more sure of it.
I think, in an extreme and hypothetical case, the compiler could inline the method body and find out that an even better optimization is replacing a sequence of specialized chunks of code by a loop calling non-specialized code. But, in such case, I would trust the compiler is doing its best and not try to outsmart it.
I may have missed something, but why is everyone mentioning union splitting (OP have never mentioned it)? Also, he said he will never have mixed types inside an array.
Not exactly, because in your case the βunknown typeβ is a parameter so it is in fact known at the time of method compilation/specialization. In my understanding, union splitting is often necessary when a binding in a local scope (or a boxed element of a vector) has a value of unknown type (I mean, the type of the value was not known at the time of the method compilation/specialization), it does not seem to be your case.
Yes, you are correct, but there are some subtle and important details (maybe you are completely aware of them, but I can not tell by what you said).
For example, if you always call f inside other functions that βknowβ the type they are passing to f then this does not make a difference.
In the spirit of the example given by Tim, you call f just after calling another function (let us call it g) that could return either Vector{A} or Vector{B} (and the compiler know there is only these two possibilities), so the outer function that is enclosing the calls to both g and f does not really know the type of the value returned by g when it goes to call f. When Julia compile the outer function, it has two options: do union splitting which is the same as creating an if and replicating chuncks of specialized code; do not do union splitting and, consequently, call a complex procedure that will get the correct method and call it through a function pointer (if I am not wrong about the internals).
Also, Tim does not say at any point that his equivalent of our function f needs a signature like f(x :: Union{SomeType,AnotherType}) = .... Theoretically, it could be f(x) = ... what is the same as f(x :: Any) = ..., the question is not how loose f signature is, or how many different specializations the function have, but if in the context where the function is called (and not where it is defined) the compiler know there is only two, three, or four possibilities, and replicating code inside if statements is a better idea than calling the multiple dispatch handler.
In the first case (call on Vector{Any}), length of an arbitrary object may be overloaded to return anything, so the compiler falls back to boxing the return value.
In the second case, there are two branches for length, both returning Int, so it is safe to assume that the resulting function also returns an Int.
But you said that you never have mixed vectors, so why does that matter?
Unless I am missing something, it looks like you just need something like
f(x::AbstractVector{T}) where {T <: Union{A,B}} = first(x).a
and the only thing that requires restructing T is dispatch. If you never use f on anything else (that would require a different approach for handling x), you donβt even need that.
Union splitting will not need to kick in if you use concretely typed arguments.
My question was if defining an abstract type had any advantage or disadvantage relative to that option. Specifically I was in doubt if any of these options would lead to any kind of type instability. But, in my particular case, it does not seem so.