Stably indexed table


#1

Out of all the modules for dataframe-like structures (is there a list?), is there one where table[:col] or table[Val{:col}] is type-stable? As JMW put it:

A second possible solution is to generate custom DataFrame types for every distinct DataFrame object. This could convert DataFrames from black-box containers that contain objects of arbitrary type into fully typed containers that can only contain objects of types that are fully known to the compiler.

Was that ever attempted?


#2

Take a look at https://github.com/FugroRoames/TypedTables.jl.


#3

Awesome, thank you. @andyferris Is this package still maintained, or at least, will PRs be considered?


#4

There’s also IndexedTables (still in development):

And the CompositeDataFrame (never used much) in DataFramesMeta:


#5

And if you want to use all of these at once and be able to easily convert from one to the other etc., check out https://github.com/davidanthoff/IterableTables.jl. For most of the support table types, you can then just do IndexedTable(df) where e.g. might be a DataFrame (or any of the other types supported) and things will automatically convert. Essentially this should work in all permutations of source and sink type.

For now you’ll need to use master of SimpleTraits, once there is a new tagged version of that I’ll register and release IterableTables properly.


#6

Cool! Hadn’t seen that.


#7

PRs - always, definitely :wink:

I have been musing about slightly more flexible data containers (that support things like group by operations, column and row indices that can be static or dynamic, and fast lookup indexes). But so far I don’t have anything meaningful and open source to release (for work, I do have a bespoke strongly typed tabular data structure with fast spatial lookup, which has a more convenient interface than in TypedTables.)

Anyway, I still believe that, in the right circumstances, a TypedTable is really useful and convenient, so
future work on TypedTables to make it have a better API, be more flexible and fully featured and work with other packages, and to fix bugs seems worthwhile.


#8

IndexedTables is interesting, but unless I’m mistaken, it’s not really a tabular type. It looks more like a dictionary / map.

I just tried DataTables.jl and Query.jl for the first time:

dt = DataTable(A = 1:4, B = ["M", "F", "F", "M"])
f(dt) = @from i in dt begin
           @select i.A
           @collect
       end
@code_warntype f(dt)

tells me that this is not type-stable. Am I correct that the point of Query.jl and DataFramesMeta.jl’s @with is that whatever computation happens inside of their respective macro expansions will be done in a type-stable and efficient way, but I shouldn’t expect a type-inferred return value?

Regarding CompositeDataFrame, is there a substantive difference with TypedTables? My understanding is that the former generates a type by calling eval, while the latter uses parametric types to achieve essentially the same thing (which is perhaps a bit cleaner?) I would expect both approaches to have essentially the same performance characteristics, right?

Awesome! Is there still any hope of a common interface package, so that we don’t have to use Requires.j? AbstractTables.jl looks pretty dead.


#9

Am I correct that the point of Query.jl and DataFramesMeta.jl’s @with is that whatever computation happens inside of their respective macro expansions will be done in a type-stable and efficient way, but I shouldn’t expect a type-inferred return value?

For Query, yes. Not sure about DataFramesMeta. I am thinking about something like this for Query currently:

@from i in dt begin
    @select i.A
    @collect
end q begin
    # In this block q holds the result of the query, and everything is type stable
    println(q)
end

But I’m not sure how much I like that, so any feedback would be welcome.

Awesome! Is there still any hope of a common interface package, so that we don’t have to use Requires.j? AbstractTables.jl looks pretty dead.

The use of Requires.jl is really just temporary. My goal is to move out all the source and sink specific code into the actual source and sink packages, so that IterableTable essentially only has this and maybe this code. But for now it makes things easier for me to have all the code in one repo while I iterate on things. At the end of the day IterableTable is an alternative to something like AbstractTables. The main difference really is that a) it only provides a row iteration interface (no column based interface) and b) it is not based on some common supertype, but uses traits. The latter makes the whole system much, much more flexible.


#10

I think it looks OK. This also parses:

@from(i in dt, begin
    @select i.A
    @collect
end) do q
    # In this block q holds the result of the query, and everything is type stable
    println(q)
end

Looking forward to that, good luck!


#11

IndexedTables has a Columns type you can use as a table. Its API is still a little clunkier than a DataFrame, but it’s type stable.