Hierarchical or multi-index for data frames

Apart from Julia Computing, which is a company, most Julia* organizations are just open teams of volunteers from various horizons, with lots of intersection between them. So the ecosystem isn’t as fragmented as it may seem.

Basically, the tables implementations are currently DataFrames (and KeyedFrames which is based on it), IndexedTables/JuliaDB, and TypedTables. DataFrames is the simple in-memory table storage, JuliaDB supports distributed and out-of-core operations (with IndexedTable being a simple in-memory version which is strongly typed), and TypedTables is another strongly-typed table. Whether strong typing is beneficial probably depends on whether you work with many tables with different column types or not, as the compilation overhead can be significant.

AxisArrays and NamedArrays are quite different I would say, they are more arrays than tables, even though they can have some use cases in common with tables.

Then there are currently two generic interfaces: Tables (which replaces DataStreams) and TableTraits. There’s hope that we could agree on a single common interface, but it’s still under discussion. As @piever said we would also like to converge to similar APIs when that’s possible.

1 Like