Indeed we will have to insist in the manual on the fact that metadata is designed for information that is general enough that it doesn’t get incorrect after subsetting rows or columns. As long as users are provided with a definition of what metadata is expected to be and how it behaves, it doesn’t make sense to say that propagation rules would be “correct only most of the time, and sometimes incorrect”: they are always correct, we just have to ensure users clearly understand what kind of information they should store in metadata (and that they are not too tempted to misuse the feature – which I don’t think will be the case).
That’s an interesting extension. I’d leave this out for now as we have enough issues to tackle, but I imagine a special naming scheme could be used, probably with a more specific character than “_”, e.g. "label#LANG:de" => "ELO-Bewertung in der klassischen Zeitsteuerung". Actually as long as multiple packages agree on a common convention to set such fields and/or consume them, no change is needed in DataFrames. This convention could even extend to other languages as the metadata key names can be exchanged via Arrow, Parquet, etc. Stata already supports this, I’ll check whether/how it can be imported in Julia (see this PR). I don’t know whether there are existing naming conventions in Arrow or Parquet.