How to avoid Any in list comprehensions with small unions

Hi all-

I have a situation where I want to generate one of two types of arrays from a list comprehension. In the simplest case, all of the elements are Float64. This is straightforward and always works. In the other case, the elements are Union{Float64, Array{Float64,1}}. However, in practice, Julia defaults to Any in the later case rather than the small Union described above. Here is a MWE:

Case 1:

 x = [.3,.4]
  2-element Array{Float64,1}:
 0.3
 0.4

Case 2:

 x = [rand(2),.4]
  2-element Array{Any,1}:
  [0.929281, 0.277345]
 0.4

Of course, if I use Union{Float64,Array{Float64,1}}[x for x in X], it forces a union in both cases, which I want to avoid.

Is there a way to produce Array{Float64,1} in the first case and Array{Union{Float64, Array{Float64,1}},1} in the second case?

I don’t think there is a way. This uses promote_typejoin internally. It has a special case for nothing and missing to give a small union, but for all other things, it will fall back to typejoin, which gets you either a concrete type or Any in all cases that I can think of.

What benefit are you hoping to get from this? If it is performance, could you show a benchmark where this makes a difference? We may be able to advice you to get your objective in another way.

Thanks for your reply. I was afraid that this might be the case. I have not ran a benchmark, but since it is a critical part of my code, I am following general advice to avoid abstract containers. My understanding is that a small union can mitigate this issue to some degree, but it’s still slower than a container of a single type. This is why I am trying to handle both cases.

I’m not quite sure how I can incorporate promote_join in to my code. Do you have an example? As an alternative, perhaps I can wrap the list comprehension in a function and use dispatch to deal with the two cases. That should work well enough. It would be nice if Julia created a small union (e.g. 2 or 3 types) by default and Any[] could be used to override the default. I suppose this may have been considered already, but was rejected due to some issue that is not apparent to me.

This might be useful for someone who comes across this post.

Measuring what you are doing is at the very top of the general performance advice, above avoiding abstract containers. I highly recommend it: in my experience, the first order of magnitude is usually cut by something completely silly that you wouldn’t have thought of without profiling.

2 Likes

You are right. I should have ran some benchmarks prior to opening the thread, but unfortunately, I cannot share the code, at least for now. So it may not have been that helpful. I did just run some tentative benchmarks, which suggest Any will probably cause performance issues in my larger code base. Dispatching on the two cases should be feasible. So hopefully that helps someone else who encounters this issue. Thanks again for your help!