I just posted this on the slack channel re status of Query/IterableTables/Missings:
I don’t see a path for Query and iterable tables to use
missingin the julia 1.0 time frame. My read of that discussion so far is that there is one broad strategy on the table that would a) require much more work in base in terms of union return types (essentially not just making small union return types fast, but really large ones with 2^n elements where n is the number of columns) and b) it would require pretty much every collection type that currently accepts an iterator for initialization to have materialization code that is a lot more complex than what they currently have, essentially special case handling the results of queries. The latter in my mind really breaks the very nice composability properties Query has right now when used with
DataValuefor the missing value story (you can materialize queries into all sorts of data structures that I have never heard of in an efficient way).
So my take on this is that the
Missingsdesign at this point would still be a significant step back for Query. In my mind we are still at a point where
Missingsis a design that works great for some parts of the data ecosystem, but is not a good design for other parts.
Irrespective of what one thinks about the merits of this broad strategy, it seems extremely unlikely that a) and b) would be done by julia 1.0 (as far as I can tell they are broad ideas at this point, with no concrete design or anyone working on them). Who knows, maybe in julia 1.1
Missingswill be more usable for things like Query and I can revisit things… Having said that, the current design with
DataValueseems to work great, i.e. it is not exactly the case that there are problems with that design that
Missingswould solve, and I really try very hard to not break code that uses Query/IterableTables and friends, so I think any decision down the road in the julia 1.1 timeframe would have to take such constraints into account as well.
I am almost done finishing the interop story for IterableTables and the new DataFrames, so that will allow you to use Query with the new DataFrame. The model will be the same as it is today: regardless of what missing story a source uses, in the query itself you’ll deal with
DataValues, and when you materialize a query I’ll use the “native” missing story of the type that is the sink (so you’ll get
DataFrames that have
Sorry for the duplicate text, I only saw this message here after I had posted on slack, but I don’t want to leave @ValdarT’s question unanswered.