Since this discussion now has a wider audience than PRs on github, it may be helpful to provide a superficial implementation of an interface and describe why it is difficult to settle on an implementation. Let’s put this together with some basic types and traits that would provide support for the features we’re discussing.
abstract type MetaStyle end
struct MetaUnknown <: MetaStyle end
struct MetaAxes <: MetaStyle end
struct MetaDims <: MetaStyle end
struct MetaSelf <: MetaStyle end
struct MetaValues <: MetaStyle end
struct MetaDynamic <: MetaStyle
style::MetaStyle
end
abstract type PropagationStyle end
struct PropagateCopy <: PropagationStyle end
struct PropagateDrop <: PropagationStyle end
struct PropagateShare <: PropagationStyle end
struct PropagateDynamic <: PropagationStyle
style::PropagationStyle
end
People can add new types for special interactions but this provides basic support for additional contextualization of metadata and both runtime and compile-time optimizations. However, we need to figure out how this information is stored and accessed in relation to the data-metadata object. The following is one approach we could take that permits the traits above to provide specialized code and dynamic code.
struct MetaDatum{D,S,P}
data::D
style::S
propagation::P
end
struct MetaData{K,V,S,P,D<:AbstractDict{K,MetaDatum{V,S,P}}} <: AbstractDict{K,V}
data::D
end
# dictionary interface for accessing metadata values (assume we have the full interface defined)
Base.getindex(@nospecialize(x::MetaData), key) = getfield(x, :data)[key]
propagate_metadata(::PropagateDrop) = false
propagate_metadata(x::PropagateDynamic) = !(getfield(x, :style) isa PropagateDrop)
propagate_metadata(@nospecialize x::PropagationStyle) = true
propagate_metadata(@nospecialize x::MetaDatum) = propagate_metadata(getfield(x, :propagation))
# methods for managing propagation of metadata when indexing
index_metadata(::MetaAxes, ::PropagateShare, @nospecialize(data), inds::Tuple) = map(view, data, inds)
index_metadata(::MetaAxes, ::PropagateCopy, @nospecialize(data), inds::Tuple) = map(getindex, data, inds)
function index_metadata(::MetaAxes, p::PropagateDynamic, @nospecialize(data), inds::Tuple)
if p.style isa PropagateCopy
map(getindex, data, inds)
elseif p.style isa PropagateShare
map(view, data, inds)
else
error("unsupported PropagationStyle $(p.style)")
end
end
index_metadata(::MetaStyle, ::PropagateShare, @nospecialize(data), inds::Tuple) = data
index_metadata(::MetaStyle, ::PropagateCopy, @nospecialize(data), inds::Tuple) = copy(data)
function index_metadata(::MetaStyle, p::PropagateDynamic, @nospecialize(data), inds::Tuple)
if p.style isa PropagateCopy
copy(data)
elseif p.style isa PropagateShare
data
else
error("unsupported PropagationStyle $(p.style)")
end
end
function index_metadata(md::MetaDatum, inds::Tuple)
s = getfield(md, :style)
p = getfield(md, :propagation)
MetaDatum(index_metadata(s, p, getfield(md, :data), inds), s, p)
end
function index_metadata(@nospecialize(md::MetaData), inds::Tuple)
MetaData(map(Base.Fix2(index_metadata, inds), Iterators.filter(propagate_metadata, md.data)))
end
Note the (potentially excessive) use of @nospecialize
. We want to ensure that we can have MetaDatum{MetaDynamic,PropagateDynamic,Any}
and not create new method instances when dispatching on the individual fields. Yet we can still get new method instances for new types so that inference won’t fail us when it’s important. This is a bit oversimplified, but I think it still serves the purpose of illustrating how simple it is to support metadata types like this:
struct MetaTable{P,M} # assume the Table.jl API is properly defined for this:
parent::P
metadata::M
end
Base.parent(x::MetaTable) = getfield(x, :parent)
metadata(x::MetaTable) = getfield(x, :metadata)
function Base.getindex(x::MetaTable, r, c)
MetaTable(parent(x)[r, c], index_metadata(metadata(x), (r, c)))
end
Most operations that need special reference to metadata can be categorized as some combination of indexing, reduction, joining/concatenation/merging, or dimension permutation. A set of simple rules like this slowly generalizes once we have a dedication graph, table, and array type for metadata.
I want to be clear that I’m not trying to sell any particular approach here because there are pros and cons to all implementations. It might be better to store the style and propagation information in parallel collections:
struct MetaData{K,V,D<:AbstractDict{K,V},S,P} <: AbstractDict{K,V}
data::D
styles::S
propagations::P
end
But then there’s the issue of mapping styles
and propagations
to the keys of data
.
Another approach is to put all of this on the the value type requiring unique meta-datum types to explicitly define their methods
index_metadata(datum, inds::Tuple) = datum
index_metadata(datum::MyAxesType, inds::Tuple) = datum[inds]
function Base.getindex(x::MetaTable, r, c)
MetaTable(parent(x)[r, c], map(Base.Fix2(index_metadata, (r, c)), metadata(x)))
end
However, the loss of traits here makes it a bit difficult to explicitly specialize and de-specialize on datum
.
Another solution is to just accept that the term “metadata” has a definition throughout the Julia ecosystem that is too inclusive for what is trying to be accomplished with DataFrame
here. I’ve taken some time to look at LLVM’s metadata, Clojure’s metadata, R’s attributes, and (to a lesser extent) Haskell’s metadata. The way this is being described seems a lot more like R’s attributes where everything about it is dynamic and requires manual dispatch. I’ve seen the term “properties” used similarly in Julia (see ImageMetadata.jl).
I’ve rambled long enough here, so I’ll just end this by suggesting that the issue of propagation will be difficult to resolve if we don’t establish a common set of qualities that metadata has/permits.