OK, let’s summarize a little bit:
- Adding metadata to
DataFrame
(or similar structures) is very useful; - There is no point (by now) in trying to attach semantic meaning to metadata. The best we can do is to appropriately propagate metadata through copies/slices/views, and discard metadata when more complicated transformations are involved;
- Two approaches can be envisioned: adding metadata to the
DataFrame
(or similar structures) as a whole, or attach them to individualArrays
. Both have pros and cons, but likely we will need both; - By now, a reasonable way to store metadata is a
Dict{Symbol, Any}
, regardless of the followed approach; - Concerning the implementation with
DataFrames
:- there is a PR to encapsulate metadata within a
DataFrame
structure, both at a global and column level. The flaw in this implementation is that the metadata, stored in thecolmeta
field of theDataFrame
structure, do not add any functionality to the package itself. It would be much better to leave theDataFrames
package as it currently is and wrap it in a container along with metadata; - I tried the wrapping approach here, but it turns out there is a lot of boilerplate code to be written;
- the difficulty in extending the
DataFrame
object may ultimately lies in the way the package has been implemented. E.g., the functionBase.getindex(df::DataFrame, col_ind::ColumnIndex)
in the package should actually accept anAbstractDataFrame
as input, not aDataFrame
. Moreover theSubDataFrame
struct inherits fromAbstractDataFrame
, but theSubDataFrame
andDataFrame
structures do not share the same fields. - I am not sure these are issues or intended design decisions for the
DataFrames
package, but they don’t allow theDataFrame
code to be easily re-used (see here for a discussion on code reusing by means of composition).
- there is a PR to encapsulate metadata within a