Problem preserving indexes using Query.jl and IndexedTables.jl


#1

When joining 2 IndexedTables some of the data columns end up as indexes. Am I missing something here, or is this an issue:

julia> using IndexedTables, IndexedTables.Table, Query

julia> dt1 = Table(Columns(id = collect(1:5)), Columns(var1 = rand(1:2,5), var2 = rand(5)))
id │ var1  var2
───┼───────────────
1  │ 2     0.353364
2  │ 2     0.995482
3  │ 2     0.174313
4  │ 2     0.833255
5  │ 1     0.20114

julia> dt2 = Table(Columns(id = collect(1:5)), Columns(var3 = rand(1:1000,5)))
id │ var3
───┼─────
1  │ 42
2  │ 682
3  │ 615
4  │ 942
5  │ 864

julia> @from i in dt1 begin
           @join j in dt2 on i.id equals j.id
           @select {i.id, i.var1, i.var2, j.var3}
           @collect Table
       end
id  var1  var2     │ var3
───────────────────┼─────
1   2     0.353364 │ 42
2   2     0.995482 │ 682
3   2     0.174313 │ 615
4   2     0.833255 │ 942
5   1     0.20114  │ 864

julia> Pkg.status("IndexedTables")
 - IndexedTables                 0.1.2              master

julia> Pkg.status("Query")
 - Query                         0.4.0+             master


#2

The final @collect statement in this query is essentially just a shortcut for passing the same query (without the final statement) to the IndexedTable constructor. So you can think of this query as being equivalent to:

q = @from i in dt1 begin
    @join j in dt2 on i.id equals j.id
    @select {i.id, i.var1, i.var2, j.var3}
end
IndexedTable(q)

So essentially you are creating an IterableTable.jl with the query, and then you are passing that iterable table to the IndexedTable constructor that handles iterable tables.

The doc for that constructor explains what is happening here: the plain constructor will always turn the last column into the data column, and all other columns into index columns. But this constructor also takes some keyword arguments, in particular you can specify which columns should be data and index columns. For example, you could do this:

IndexedTable(q, idxcols=[:id])

And then only the id column will be an index column.

You can’t call these constructors that take keyword arguments from the @collect statement at this point, so you’ll have to split your query in the way I showed above to make this work.

I have a plan on how to improve this, see here.


#3

Got it! Thanks.