Differences between table types

I am creating a package. Within it I will have tabular information. For an analysis there will be item specific information stored in tabular form. Based on the item the specific row from the table will need to be matched and the data from it extracted.

Browsing the packages, it seems DataFrames.jl, Tables.jl/TableOperations.jl and TypedTables.jl have this functionality. There may be more.

What I would like are the relative advantages and disadvatages of each solution. It seems that I read somewhere that Tables.jl is better for being buried in a package than DataFrames.jl. My gut feel is that DataFrames.jl is the best for interactive data manipulation. Would it also be recommended to be part of a package where there is no need for interactivity?

In terms of performance and being lightweight, which solution would you suggest?

What are the tradeoffs to consider?

Below is a simple example for 2 solutions.

# using Tables TableOperations
julia> ctable = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["hey", "there", "sailor"])
(A = Union{Missing, Int64}[1, missing, 3], B = [1.0, 2.0, 3.0], C = ["hey", "there", "sailor"])

julia> table = ctable |> TableOperations.filter(x->Tables.getcolumn(x, :C) == "sailor") |> Tables.columntable
(A = Union{Missing, Int64}[3], B = [3.0], C = ["sailor"])

julia> t = Table(A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["hey", "there", "sailor"])
Table with 3 columns and 3 rows:
     A        B    C
   ┌─────────────────────
 1 │ 1        1.0  hey
 2 │ missing  2.0  there
 3 │ 3        3.0  sailor

julia> sail = filter(row -> row.C =="sailor", t)
Table with 3 columns and 1 row:
     A  B    C
   ┌───────────────
 1 │ 3  3.0  sailor

julia>

From a DataFrames.jl maintainer:

1 Like