Missing data and NamedTuple compatibility

Agree 100%. This thread has an awful lot of FUD in it. I’ll risk being overly blunt in the hopes that it will provide some clarity and reduce the FUD level:

  1. We are committed to representing missing data as small unions, i.e. Union{Missing,T}. This is not arbitrary: we’ve tried all the options and this is the one that works the best with Julia’s combination of dynamic typing and high-performance JIT.

  2. Operations with “unpredictable” types should emulate Base’s map function and produce containers with an element type that is based on the values that are actually in them. They should not rely on type inference except for the case of empty containers.

  3. Yes, this map-like behavior is currently a bit tricky to implement efficiently. There has been some discussion of providing easier ways to express this kind of pattern and there are plenty of possible directions for such supporting infrastructure.

  4. The O(2^n) specializations issue has already been addressed by @jeff.bezanson: the Julia compiler deals with potentially exponential numbers of specializations all the time. This is no different. There may be some cases where the current compiler heuristics are off and bad behavior occurs, but those are dealt with the way optimizers have been improved since time immemorial: by collecting use cases and benchmarks and tweaking the heuristics until they handle more and more situations gracefully.

  5. Please stop taking @jameson’s statements out of context. They almost certainly don’t mean what you think they mean. Honestly, I’m not entirely sure what they mean. What I am sure of is that they don’t mean “I implemented the small union stuff and we’re all screwed — this will never work.” which is how people are representing them here.

In short, we are quite clear on how missing values will work in Julia: missing data is represented with small unions. The performance is already good and it will improve over time. There are some issues to be worked out in terms of which programming patterns are best for dealing with this representation, but the uncertainty is quite overblown in this thread. These patterns will become clearer over time as more people have worked with the new representation. As indicated by @ExpandingMan and others, it’s already quite usable.

18 Likes