Tables partitions

Regarding the Tables.jl interface, I find the part about partitions quite confusing. For starters it’s quite sparse, as far as I can tell there’s only a few paragraphs [1] on Tables.partitions and Tables.partitioner. And then it doesn’t actually give you any context. It doesn’t tell you what problem is being solved, it just tells you what actions are performed by calling the functions.

I have the following specific problem, if anyone knows the answer:

Conceptually, lets say I have a partitioned table:

using DataFrames
df1 = DataFrame(:a=>[1,2,3,4,5,6,7])
df2 = DataFrame(:a=>[8,9,10])
t = [df1, df2]

I call this a partitioned table because I can iterate over it, and each element is a Tables.istable. However, t by itself is not Tables.istable. So if I want to make t into something conforming to the Tables.jl interface, what do I have to do? Something that I can call Tables.rows on, and will iterate first over df1 then over df2 seamlessly.

At first I thought this is the point of either Tables.partitions or Tables.partitioner - take an iterator of Tables and return a Table. But no, that’s not the case because neither of the things below work:

t |> Tables.partitions |> Tables.istable  # false
t |> Tables.partitioner |> Tables.istable  # false

[1] Home · Tables.jl

1 Like

You might find this video from @quinnj useful:

1 Like

I assume you want this:

julia> for row in Tables.rows(TableOperations.joinpartitions(Tables.partitioner([df1, df2])))
           @show row
       end
row = Tables.ColumnsRow{Tables.CopiedColumns{TableOperations.JoinedPartitions{Tables.Schema{(:a,), Tuple{Int64}}}}}:
 :a  1
row = Tables.ColumnsRow{Tables.CopiedColumns{TableOperations.JoinedPartitions{Tables.Schema{(:a,), Tuple{Int64}}}}}:
 :a  2
row = Tables.ColumnsRow{Tables.CopiedColumns{TableOperations.JoinedPartitions{Tables.Schema{(:a,), Tuple{Int64}}}}}:
 :a  3
row = Tables.ColumnsRow{Tables.CopiedColumns{TableOperations.JoinedPartitions{Tables.Schema{(:a,), Tuple{Int64}}}}}:
 :a  4
row = Tables.ColumnsRow{Tables.CopiedColumns{TableOperations.JoinedPartitions{Tables.Schema{(:a,), Tuple{Int64}}}}}:
 :a  5
row = Tables.ColumnsRow{Tables.CopiedColumns{TableOperations.JoinedPartitions{Tables.Schema{(:a,), Tuple{Int64}}}}}:
 :a  6
row = Tables.ColumnsRow{Tables.CopiedColumns{TableOperations.JoinedPartitions{Tables.Schema{(:a,), Tuple{Int64}}}}}:
 :a  7
row = Tables.ColumnsRow{Tables.CopiedColumns{TableOperations.JoinedPartitions{Tables.Schema{(:a,), Tuple{Int64}}}}}:
 :a  8
row = Tables.ColumnsRow{Tables.CopiedColumns{TableOperations.JoinedPartitions{Tables.Schema{(:a,), Tuple{Int64}}}}}:
 :a  9
row = Tables.ColumnsRow{Tables.CopiedColumns{TableOperations.JoinedPartitions{Tables.Schema{(:a,), Tuple{Int64}}}}}:
 :a  10
3 Likes

Fantastic, thank you.

and then, of course,

julia> Tables.istable(TableOperations.joinpartitions(Tables.partitioner(t)))
true