For example, for a DataFrame
, if you just want to split the data in a few subsets, this can be done efficiently by indexing the DataFrame
object with a vector of indices; so DataFrames could implement that generic API by simply calling its getindex
method. Even for other table types which are not as suited to random access as DataFrame
(like SQL tables), indexing with a vector could probably be implemented with an acceptable performance if you only take a few subsets.
On the contrary, if what you want is index single rows repeatedly in a performance-critical part of the code, then maybe you’d better work on a named tuple of columns (or a matrix) than on a DataFrame
; the API would have to include e.g. a fastindexingtable
function that you would call to get that named tuple of columns before performing repeated indexing. Likewise, for types such as SQL tables, the only solution would be to copy everything in memory first.
From your description, it seems you care about the first case? Indeed it seems the simplest to support.