DataFrames: Which formats (if any) support saving of metadata

Are there file formats that support saving of metadata (global and columnar)?

1 Like

Arrow.jl does.

1 Like

Just as a side note, TableMetaDataTools.jl has utilities to save metadata as toml files and then re-load them and attach them to a data frame. So in theory you could save a .csv and a .toml together and load them both.

It won’t handle cross-language metadata obviously (though maybe someone should write an R package…)

Are you sure? I’ve tried and it doesn’t seem to retain the metadata:

using Arrow
using DataFrames

df = DataFrame(a = 1:3, b= 'A':'C')
Arrow.write("test.arrow", df)
df = DataFrame(Arrow.Table("test.arrow"))

colmetadata!(df, :a, "test", "hope this works"; style = :note)
colmetadata(df, :a, "test")

Arrow.write("test2.arrow", df)
df = DataFrame(Arrow.Table("test2.arrow"))
colmetadata(df, :a, "test")
ERROR: ArgumentError: no column-level metadata found for column "a"
Stacktrace:
 [1] colmetadata(df::DataFrame, col::Symbol, key::String, default::DataFrames.MetadataMissingDefault; style::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/metadata.jl:367
 [2] colmetadata
   @ ~/.julia/packages/DataFrames/kcA9R/src/other/metadata.jl:360 [inlined]
 [3] colmetadata(df::DataFrame, col::Symbol, key::String)
   @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/metadata.jl:360
 [4] top-level scope
   @ ~/Documents/GitHub/ItsLivePlayground.jl/src/RiverTest.jl:41

Oh, that’s good to know.

Do you know of a file format that is able to store the metadata in the same file as the table?

No, no one has made that file format.

I found this thread here on how to append metadata to an Arrow file