[ANN] AbstractMetaArrays.jl – A Framework for Metadata-Aware Arrays

I’m pleased to announce the registration of AbstractMetaArrays.jl, a Julia package that provides a flexible framework for creating arrays enriched with metadata. The package should become available in the General registry after the standard waiting period.​

Overview

AbstractMetaArrays.jl introduces a new abstract type that extends Julia’s array capabilities by allowing arrays to carry additional metadata (metadata) and, optionally, column-specific metadata (colmetadata). This is particularly useful in scientific computing domains like micromagnetics and photonics, where contextual information (e.g., units, coordinate systems, physical properties, normalization factors) is essential.​

The original motivation for this package was the need to:

  • Maintain Dimensional Context: Perform computations on normalized data while retaining information about original units and normalization factors within the same array structure, facilitating accurate visualization and comparison of results.
  • Handle Diverse Coordinate Systems: Manage and compare datasets using different conventions (e.g., geocentric vs. geodetic coordinates) without the overhead of continuous conversions, by embedding coordinate system information directly into the array metadata.

but I couldn’t find an existing package that could easily do that out of the box.

Key Features

  • Metadata Integration: Attach arbitrary metadata to arrays without compromising performance.
  • Column Metadata Support: Explicit support for colmetadata in StructArrays and StaticArrays (SVector{S} and MVector{S}).
  • DataAPI.jl Compatibility: Extends the existing metadata functions from DataAPI.jl, ensuring seamless integration with Julia’s data ecosystem.
  • Customizable Behavior: Define how metadata propagates through array operations using traits:
    • ColMetadataTrait: Indicates if the implementation supports colmetadata, independent of the underlying data type.
    • MetadataStyle and ColMetadataStyle: Control the reading and writing access to metadata and colmetadata.​

Utilities

The package also provides:​

  • SimpleMetaArray: A straightforward implementation of a concrete type for quick adoption.
  • create_metaarray: A utility function to easily generate new concrete metadata arrays tailored to specific needs in a format compatible with DataAPI.jl specifics.​

Additional examples of concrete implementations with different read/write access on metadata are provided in test/datastructuretest.jl. These are primarily used for testing metadata access using the trait system but could be helpful for anyone planning to create their own concrete implementation.​

Contributions and Feedback

Contributions are welcome! If you encounter any issues or have suggestions for improvements, feel free to open an issue or submit a pull request on GitHub.​

I’m looking forward to seeing how the community utilizes AbstractMetaArrays.jl in their projects.

3 Likes

Arrays with generic metadata are useful indeed, but would be nice to have a comparison with MetadataArrays.jl – so that users could understand the context and any potential differences.

The main difference is that it provides an abstract type that forwards all the functions of array, also it allows for different access to read/write access using traits.

The other thing is that MetadataArrays.jl does not support colmetadata arrays.

For example, if I just want to know if one particular element of my structure has certain information, I can just query the correct key

This is the usercase for the project I am working on


# definition of concrete structure
struct GeoArray{T,N,A<:StructArray} <: AbstractMetaArray{T,N,A}
    _data::A
    _metadata:: ConcreteMetaType
    _colmetadata:: Dict{Symbol,ConcreteMetaType}

    function GeoArray{T,N}(data::StructArray{T,N}, _metadata::ConcreteMetaType, _colmetadata::Union{Tuple{ConcreteMetaType},ConcreteMetaType}) where {T,N}
      # create metaarray takes care of creating the correct format for DataAPI.jl
      # where the metadata has the form Dict{String,(Any,Symbol)}, the symbol is necessary
      # and if it is not provided is gives as :default, also any key is converted 
      # to a string as required by the interface.
      metainfo=AbstractMetaArrays.create_metaarray(GeoArray{T,N}, data, _metadata, _colmetadata)
      return new{T,N,StructArray{T,N}}(data, metainfo...)
    end

    function GeoArray{T,N}(data::A, _metadata::ConcreteMetaType, _colmetadata::Union{Tuple{ConcreteMetaType},ConcreteMetaType}) where {T,N,A<:AbstractArray{T,N}}
      metainfo=AbstractMetaArrays.create_metaarray(GeoArray{T,N}, data, _metadata, _colmetadata)
      
      return new{T,N,StructArray{T,N}}(StructArray(data), metainfo...)
    end
end


# tells me that the method support column metatada
AbstractMetaArrays.ColMetadataTrait(::Type{<:GeoArray}) = AbstractMetaArrays.HasColMetadata()
# readonly colmetadata, so that it can only be modified internally by the code
AbstractMetaArrays.ColMetadataStyle(::Type{<:GeoArray}) = AbstractMetaArrays.ReadOnlyColMetadata()
# readonly metadata
AbstractMetaArrays.MetadataStyle(::Type{<:GeoArray}) = AbstractMetaArrays.ReadOnlyMetadata()


const MAJORAXISWGS84 = majoraxis(ellipsoid(WGS84Latest)) |>
    x-> uconvert(km, x) |> ustrip
const ECCENTRICITY2WGS84 = eccentricity²(ellipsoid(WGS84Latest))
const NORMALIZEDMINORAXISWGS84 = minoraxis(ellipsoid(WGS84Latest)) |>
    x-> uconvert(km, x) |> ustrip |> x-> x/MAJORAXISWGS84

# generic information of all the structures
const DEFAULT_METADATA = ConcreteMetaType(
  "datum"=> (WGS84Latest, :datum),
  "major_axis"=> (MAJORAXISWGS84, :major_axis),
  "units_length"=> (km, :units_length),
  "units_angle"=> (deg, :units_angle),
  "eccentricity" => (ECCENTRICITY2WGS84, :eccentricity2),
  "normalized_minor_axis" => (NORMALIZEDMINORAXISWGS84, :normalized_minor_axis),
)

@enum INTENT INPUT=1 OUTPUT=2 INPUTOUTPUT=0

# individual information of the columns
const DEFAULT_COLMETADATA = ConcreteMetaType(
  "has_dimensions"=> (true, :bool),
  "is_normalized" =>  (false,:bool),
  "unit"          => (km,:unit),
  "intent"        => (INPUT,:intent),
  )


GeoArray(data; metadata=DEFAULT_METADATA, colmetadata=DEFAULT_COLMETADATA) = GeoArray{eltype(data),ndims(data)}(data, metadata, colmetadata)

# generic structure for ray tracing (as an example)
struct Ray2{T}
    point_x::T
    point_y::T
    direction_x::T
    direction_y::T  
end

# the data structure
ga=GeoArray(Array{Ray2{Float64},1}(undef,100))

metadata(ga)

the idea is that I can keep in the structure some data that are normalized for computational purposes but can be seen in the unit of interests when required for visualization and plotting data. Plus creating a new concrete type is almost painless (the few lines at the beginning) while keeping all the functionality and the modularity of the type

I see, probably it’s indeed different…

Btw,

note that one can use colmetadata when MetadataArrays are columns in a StructArray.

1 Like

That wasn’t particularly clear to me: I saw that struct array can support colmetadata, if the underlying array already supports colmetadata, while MetadataArrays does not even export the functions in their code.

But the main thing was to make something that was more modular, and I could use and adapt for different projects easily: if I just need an array with everything I can use the SimpleMetaArray that has all writing and reading functionalities, if I need something more niche I can simply make something new in few lines that keeps all the basic functions I need