DataFrames.jl: metadata

My two cents concerning the “metadata” functionality (without worrying about propagation):

  • the way to use and propagate “metadata” is definitely application-specific, and is in no way related to the DataFrames.jl package (quoting @bkamins: “metadata is not used for any logic of processing data in DataFrames.jl”), hence the functionality should be provided by a separate package;
  • however, such external package would hardly integrate transparently with DataFrames.jl since the latter expose a complex interface based on several methods:
julia> length(methodswith(DataFrame, supertypes=true))
308

The consequence is that in order to exploit the “metadata” functionality we would likely be forced to use an alternative interface for the DataFrame object. Taking as an example the above-mentioned Metadata.jl package:

julia> using Metadata, DataFrames
julia> df = DataFrame(id=[1,2]);
julia> mdf = attach_metadata(df, (x = 1, y = 2));
julia> names(mdf)  # <-- can't use DataFrame methods on the wrapped object!
ERROR: MethodError: no method matching names(::Metadata.MetaStruct{DataFrame, NamedTuple{(:x, :y), Tuple{Int64, Int64}}})
Closest candidates are:
   ...

(the purpose here is of course not to blame Metadata.jl, but just to provide an example…).

In summary: “metadata” should be implemented in an external package, but implementing such package would be unreasonably hard. This issue has been discussed in several posts on discourse, e.g. here.

I’m afraid this whole discussion highlight some kind of limitation in the Julia type system :sob:, or at least some difficulty in implementing a facility which is conceptually very simple.

Concerning propagation: I believe having something working in 90% of cases (or even 99%) is a very dangerous option which I really hope will be avoided. Hence my choice is #2: Add metadata to DataFrame [(because it is very useful and would hardly be used if implemented in a separate package)], but never propagate metadata.