Writing dataframe to arrow format with column metadata

I’m trying to write a dataframe to an arrow file and include column metadata. The code below runs, but example_metadata has none of the metadata.

using Arrow
using DataFrames

df = DataFrame(x = [1, 2, 3], y = [4, 5, 6], z = [7, 8, 9])
colmeta_dict = Dict(:x => [“id” => “ID1”, “comment” => “Comment1”],
:y => [“id” => “ID2”, “comment” => “Comment2”],
:z => [“id” => “ID1”, “comment” => “Comment1”])
Arrow.write(“example.arrow”, df, colmetadata = colmeta_dict)
arrow_tbl = Arrow.Table(“example.arrow”)
example_metadata = Arrow.getmetadata(arrow_tbl)

It has column metadata (not table metadata):

julia> example_metadata = Arrow.getmetadata(arrow_tbl.x)
Base.ImmutableDict{String, String} with 2 entries:
  "comment" => "Comment1"
  "id"      => "ID1"

julia> example_metadata = Arrow.getmetadata(arrow_tbl.y)
Base.ImmutableDict{String, String} with 2 entries:
  "comment" => "Comment2"
  "id"      => "ID2"

julia> example_metadata = Arrow.getmetadata(arrow_tbl.z)
Base.ImmutableDict{String, String} with 2 entries:
  "comment" => "Comment1"
  "id"      => "ID1"

See also Add reading metadata from Arrow.Table by bkamins · Pull Request #481 · apache/arrow-julia · GitHub in the future.

CC @quinnj

1 Like

Thank you! That was an easy solution! I have another problem. When reading the file back to a data frame, the column metadata is lost. How can one retain the column metadata such that it is callable with a function like colmetadata(df, :var, “label”)

Check out the Add reading metadata from Arrow.Table by bkamins · Pull Request #481 · apache/arrow-julia · GitHub branch of Arrow.jl and this will be provided. The PR is ready so it should be safe to use it- it just waits for @quinnj to approve and a release.

Thanks for directing me to the resource. I still don’t see a way to read an arrow table as a dataframe and retain the metadata. Do you know if it’s possible?
I’m looking to create a dataframe that has retained the metadata and column metadata from an arrow table with metadata and column metadata.

Do you know if it’s possible?

It is possible. As I have commented. You have to install the GitHub - apache/arrow-julia at bk/metadata branch of Tables.jl instead of released branch of this package and things will work.

Thank you very much. I installed the branch and it works perfectly.