[ANN] Schemata.jl (alpha)

Hi all,

Here’s a package I’ve started as part of a larger project that I am beating into shape. It’s not yet registered, so you’ll need to clone it from here.

I thought I’d put it out early because there has been much discussion about missing data, non-existent values (nothing) and invalid values, and this package provides one way to think about invalid values. More precisely, it requires that you specify valid values.

From the README:

A Schema is a specification of a data set.

It exists independently of any particular data set, and therefore can be constructed and modified in the absence of a data set.

This package facilitates 3 use cases:

  1. Read/write a schema from/to a yaml file. Thus schemata are portable, and a change to a schema does not require recompilation.

  2. Compare a data set to a schema and list the non-compliance issues.

  3. Transform an existing data set in order to comply with a schema as much as possible (then run the compare function on the result).

Feedback most welcome. Enjoy!

Jock