OK, let’s summarize a little bit:
- Adding metadata to
DataFrame(or similar structures) is very useful; - There is no point (by now) in trying to attach semantic meaning to metadata. The best we can do is to appropriately propagate metadata through copies/slices/views, and discard metadata when more complicated transformations are involved;
- Two approaches can be envisioned: adding metadata to the
DataFrame(or similar structures) as a whole, or attach them to individualArrays. Both have pros and cons, but likely we will need both; - By now, a reasonable way to store metadata is a
Dict{Symbol, Any}, regardless of the followed approach; - Concerning the implementation with
DataFrames:- there is a PR to encapsulate metadata within a
DataFramestructure, both at a global and column level. The flaw in this implementation is that the metadata, stored in thecolmetafield of theDataFramestructure, do not add any functionality to the package itself. It would be much better to leave theDataFramespackage as it currently is and wrap it in a container along with metadata; - I tried the wrapping approach here, but it turns out there is a lot of boilerplate code to be written;
- the difficulty in extending the
DataFrameobject may ultimately lies in the way the package has been implemented. E.g., the functionBase.getindex(df::DataFrame, col_ind::ColumnIndex)in the package should actually accept anAbstractDataFrameas input, not aDataFrame. Moreover theSubDataFramestruct inherits fromAbstractDataFrame, but theSubDataFrameandDataFramestructures do not share the same fields. - I am not sure these are issues or intended design decisions for the
DataFramespackage, but they don’t allow theDataFramecode to be easily re-used (see here for a discussion on code reusing by means of composition).
- there is a PR to encapsulate metadata within a